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Abstract 

In this paper, we consider different aspects of the problem of compressing for function computation 
across a network, which we call network functional compression. In network functional compression, 
computation of a function (or, some functions) of sources located at certain nodes in a network is desired 
at receiver(s), that are other nodes in the network. The rate region of this problem has been considered 
in the literature under certain restrictive assumptions, particularly in terms of the network topology, the 
functions and the characteristics of the sources. In this paper, we present results that significantly relax 
these assumptions. Firstly, we consider this problem for an arbitrary tree network and asymptotically 
lossless computation and derive rate lower bounds. We show that, for depth one trees with correlated 
sources, or for general trees with independent sources, a modularized coding scheme based on graph 
colorings and Slepian-Wolf compression performs arbitrarily closely to rate lower bounds. For a general 
tree network with independent sources, optimal computation to be performed at intermediate nodes is 
derived. We show that, for a family of functions and random variables called chain rule proper sets, 
computation at intermediate nodes is not necessary. We introduce a new condition on colorings of source 
random variables' characteristic graphs called the coloring connectivity condition (C.C.C.). We show 
that, this condition is necessary and sufficient for any achievable coding scheme based on colorings, 
thus relaxing the previous sufficient zig-zag condition of Doshi et al. We also show that, unlike entropy, 
graph entropy does not satisfy the chain rule. 

Secondly, we consider a multi-functional version of this problem with side information, where 
the receiver wants to compute several functions, with different side information random variables. We 
derive a rate region and propose a coding scheme based on graph colorings for this problem. Thirdly, 
we consider the functional compression problem with feedback. We show that, in this problem, unlike 
Slepian-Wolf compression, by having feedback, one may outperform rate bounds of the case without 
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feedback. These results extend those of Bakshi et al. Fourthly, we investigate the problem of distributed 
functional compression with distortion, where computation of a function within a distortion level is 
desired at the receiver. We compute a rate-distortion region for this problem. Then, we propose a 
simple suboptimal coding scheme with a non-trivial performance guarantee. 

Our coding schemes are based on finding the minimum entropy coloring of the characteristic graph 
of the function we seek to compute. In general, it is shown by Cardinal et al. that finding this coloring 
is an NP-hard problem. However, we show that, depending on the characteristic graph's structure, there 
are some interesting cases where finding the minimum entropy coloring is not NP-hard, but tractable 
and practical. In one of these cases, we show that, by having a non-zero joint probability condition 
on random variables' distributions, for any desired function, finding the minimum entropy coloring can 
be solved in polynomial time. In another case, we show that, if the desired function is a quantization 
function with a certain structure, this problem is also tractable. 

Index Terms 

Functional Compression, Distributed Computation, Slepian-Wolf Compression, Graph Entropy, Graph 
Coloring, Feedback, Distortion. 

I. Introduction 

In this paper, we consider different aspects of the functional compression problem over 
networks. In the functional compression problem, we would like to compress source random 
variables for the purpose of computing a deterministic function (or some deterministic functions) 
at the receiver(s) when these sources and receivers are nodes in a network. Traditional data 
compression schemes are special cases of functional compression, where their desired function 
is the identity function. However, if the receiver is interested in computing a function (or some 
functions) of sources, further compressing is possible. In the rest of this section, we review 
some prior relevant research and illustrate some research challenges of this problem through 
some motivating examples which will be discussed in the following sections. 

A. Prior Work in Functional Compression 

We categorize prior work into the study of lossless functional compression and that of func- 
tional compression with distortion. 
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Fig. 1. a) Functional compression with side information b) A distributed functional compression problem with two transmitters 
and a receiver c) An achievable encoding/decoding scheme for the functional compression. 



1 ) Lossless Functional Compression: By lossless computation, we mean asymptotically loss- 
less computation of a function: the error probability goes to zero as the block length goes to the 
infinity. 

First, consider the network topology depicted in Figure [U-a which has two sources and a 
receiver. One of the sources is available at the receiver as the side information. Shannon was the 
first one who considered this problem in HI for a special case when f(Xi,X 2 ) = (Jfi,Jf 2 ) 
(the identity function). For a general function, Orlitsky and Roche provided a single-letter 
characterization in Q. In Q, Doshi et al. proposed an optimal coding scheme for this problem. 

Now, consider the network topology depicted in Figure [Q-b which has two sources and a 
receiver. This problem is a distributed compression problem. For the case that the desired function 
at the receiver is the identity function (i.e., f(Xi, X 2 ) = (Xi, X 2 )), Slepian and Wolf provided a 
characterization of the rate region and an optimal achievable coding scheme in J4j. Some other 
practical but suboptimal coding schemes have been proposed by Pradhan and Ramchandran in 
0. Also, a rate- splitting technique for this problem is developed by Coleman et al. in [6J. Special 
cases when f(Xi,X 2 ) = X 1 and f(X 1 ,X 2 ) = (X 1 + X 2 ) mod 2 have been investigated by 
Ahlswede and Korner in 0, and Korner and Marton in [SI, respectively. Under some special 
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Fig. 2. A general one-stage tree network with a desired function at the receiver. 

conditions on source distributions, Doshi et al. in [3] investigated this problem for a general 
function and proposed some achievable coding schemes. 

Sections [III HH and [IV] of this paper consider different aspects of this problem (asymptotically 
lossless functional compression). In particular, we are going to answer to the following questions 
in these sections: 

• For a depth one tree network with one desired function at the receiver (as shown in Figure 
O, what is a necessary and sufficient condition for any coding scheme to guarantee that the 
network is solvable (i.e., the receiver is able to compute its desired function)? 

• What is a rate region of the functional compression problem for a depth one tree network 
(a rate region is a set of rates for different links of the network under which the network is 
solvable)? How can a modularized coloring-based coding scheme perform arbitrarily closely 
to rate bounds? 

• For a general tree network with one desired function at the receiver (as shown in Figure 
[3]), when do intermediate nodes need to perform computation and what is an optimal 
computation to be performed? What is a rate-region for this network? 

• How do results extend to the case of having several desired functions with the side infor- 
mation at the receiver? 

• What happens if we have feedback in our system? 

2) Functional Compression with Distortion: In this section, we review prior results in func- 
tional compression for the case of being allowed to compute the desired function at the receiver 
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Fig. 3. An arbitrary tree network topology. 

within a distortion level. 

First, consider the network topology depicted in Figure [U-a, with the side information at the 
receiver. Wyner and Ziv [0 considered this problem for computing the identity function at the 
receiver with distortion D. Yamamoto solved this problem for a general function f(Xi,X 2 ) in 
ifTOl . Doshi et al. gave another characterization of the rate distortion function given by Yamamoto 
in 13). Feng et al. ifTTI considered the side information problem for a general function at the 
receiver in the case the encoder and decoder have some noisy information. 

For the network topology depicted in [U-b and for a general function, the rate-distortion region 
has been unknown, but some bounds have been given by Berger and Yeung [12J, Barros and 
Servetto Q~3j|, and Wagner et al. |fl4|. where considered a specific quadratic distortion function. 

In Section |V] of this paper, we answer to the following questions: 

• What is a multi-letter characterization of a rate-distortion function for a distributed network 
depicted in Figure [Q-b? 

• For this problem, is there a simple suboptimal coding scheme based on graph colorings 
with a non-trivial performance guarantee? 

Remarks: 

• Our coding schemes in Sections lUHVl are based on finding a minimum entropy coloring of a 
characteristic graph of the function we seek to compute. In general, reference [fT5l showed 
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that, finding this coloring is an NP-hard problem. However, in Section [VjJ we consider 
whether there are some functions and/or source structures which lead to easy and practical 
coding schemes. 

• Note that, our work is different in techniques and the problem setup from multi-round 
function computation (e.g., [16] and ifTTl ). Also, some references consider the functional 
computation problem for specific functions. For example, lfT9l investigated computation of 
symmetric Boolean functions in tree networks and lf20ll and [fT8l studied the sum-network 
with three sources and three terminals. Note that, in our problem setup, the desired function 
at the receiver is an arbitrary function. Also, we are interested in asymptotically lossless or 
lossy computation of this function. 

In the rest of this section, we explain some research challenges of the functional compression 
problem by some examples. In the next sections, we explain these issues with more detail. 

B. Problem Outline 

In this section, we address some problem outlines of functional compression. We use different 
simple examples to illustrate these issues which will be explained later in this paper. 
Let us proceed by an example: 

Example 1. Consider the network shown in Figure \B-b, which has two source nodes and a 
receiver. Suppose source nodes have two independent source random variables (RVs) Xi and 
X 2 such that X\ takes values from the set X\ = {x\, xf, x\ , xf} = {0,1,2,3}, and X 2 takes 
values from the set X 2 = {x\, x 2 } = {0, 1}, both with equal probability. Values of xj for 
different i and j are shown in Figure |4] Suppose the receiver desires to compute a function 
f(X 1 ,X 2 ) = (X 1 +X 2 ) mod 2. 

If Xi = or X\ = 2, for all possible values of X 2 , we have f(Xi,X 2 ) = X 2 . Hence, 
we do not need to distinguish between X\ — and X% = 2. A similar argument holds for 
Xi — 1 and Xi = 3. However, cases Xi — and Xi — 1 should be distinguished, because 
for X 2 = 0, the function value is different when Xi = than the one when Xi — 1 (i.e., 
/(0,0) = 0^/(1, 0) = 1). 

We notice that for each source random variable, depending on the function at the receiver and 
values of the other source random variable, we should distinguish some possible pair values. 
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Fig. 4. Characteristic graphs described in Example [Tj a) Gx t , b) Gx 2 ■ 

In other words, values of source random variables which potentially can cause confusion at 
the receiver should be assigned to different codes. To determine which pair values of a random 
variable should be assigned to different codes, we make a graph for each random variable, called 
the characteristic graph or the confusion graph of that random variable ([[II, EH)- Vertices of 
this graph are different possible values of that random variable. We connect two vertices if they 
should be distinguished. For the problem described in Example [Q the characteristic graph of X\ 
(called Gx x ) is depicted in Figure HJ-a. Note that we have not connected vertices which lead to 
the same function value for all values of X 2 . The characteristic graph of X 2 (Gx 2 ) is shown in 
Figure [H-b. 

Now, we seek to assign different codes to connected vertices, which corresponds a graph 
coloring where we assign different colors (codes) to connected vertices. Vertices that are not 
connected to each other can be assigned to the same or different colors (codes). Figure H]-(a,b) 
shows valid colorings for Gx 1 and Gx 2 - 

Now, we propose a possible coding scheme for this example. First, we choose valid colorings 
for Gx 1 and Gx 2 - Instead of sending source random variables, we send these coloring random 
variables. At the receiver side, we use a look-up table to compute the desired function value by 
using the received colorings. Figure [5] demonstrates this coding scheme. 

However, this coloring-based coding scheme is not necessarily an achievable scheme. In other 
words, if we send coloring random variables instead of source random variables, the receiver 
may not be able to compute its desired function. Hence, we need some conditions to guarantee 
the achievability of coloring -based coding schemes. We explain this required condition by an 
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Fig. 5. a) Gx t b) Gj 2 , and c) a decoding look-up table for Example [T] (Different letters written over graph vertices indicate 
different colors.) 

example. 

Example 2. Consider the same network topology as explained in Example [7] shown in Figure 
\l\b. Suppose Xi = {0, 1} and X 2 = {0, 1}. The function values are depicted in Figure ®a. In 
particular, /(0, 0) = and /(l, 1) = 1. Dark squares in this figure represent points with zero 
probability. Figure ®b demonstrates characteristic graphs of these source random variables. 
Each has two vertices, not connected to each other. Hence, we can assign them to a same color. 
Figure ®b shows these valid colorings for Gx r and Gx 2 - However, one may notice that if we 
send these coloring random variables instead of source random variables, the receiver would 
not be able to compute its desired function. 

Example [2] demonstrates a case where a coloring-based coding scheme fails to be an achievable 
scheme. Thus, we need a condition to avoid these situations. We investigate this necessary and 
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Fig. 6. An example for colorings not satisfying C.C.C. (Different letters written over graph vertices indicate different colors.) 

sufficient condition in Section [TH We call this condition the coloring connectivity condition or 
C.C.C. The situation of Example [2] happens when we have a disconnected coloring class (a 
coloring class is a set of source pairs with the same color for each coordinates). C.C.C. is a 
condition to avoid this situation. 

Hence, an achievable coding scheme can be expressed as follows. Sources send, instead of 
source random variables, colorings of their random variables which satisfy C.C.C. Then, they 
perform source coding on these coloring random variables. Each receiver, by using these colors 
and a look-up table, can compute its desired function. 

However, we may need to consider coloring schemes of conflict graphs of vector extensions 
of the desired function. In the following, we explain this approach by an example: 

Example 3. Consider the network shown in Figure U^b. Suppose X\ is uniformly distributed 
over X\ = {0,1,2,3,4}. Consider X 2 and f(Xi,X 2 ) such that we have a graph depicted in 
Figure \7\for Gxi- Figure \7\also demonstrates a valid coloring for this graph. Let us call this 
coloring random variable cq x ■ Hence, we have H(cg Xi ) ~ 1-52. Now, instead of X\, suppose 
we encode X\ x X% (X\), a random variable with 25 possibilities ({00, 01, 44}). To make 
its characteristic graph, we connect two vertices when at least one of their coordinates are 
connected in Gx x ■ Figure |#] illustrates the characteristic graph of X\ ( referred by G\ and 
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Fig. 7. Gxi described in Example [3] (Different letters written over graph vertices indicate different colors.) 



called the second power of the graph GxJ- A valid coloring of this graph, called c G i is shown 



Example [3] demonstrates this fact that if we assign colors to a sufficiently large power graph 
of Gx 1 , we can compress source random variables more. In Section HH we show that sending 
colorings of sufficiently large power graphs of characteristic graphs which satisfy C.C.C. followed 
by a source coding (such as Slepian-Wolf compression) leads to an achievable coding scheme. 
On the other hand, any achievable coding scheme for this problem can be viewed as a coloring- 
based coding scheme satisfying C.C.C. In Section HU we shall explain these concepts in more 
detail. 

Now, by another example, we explain some issues of the functional compression problem over 
tree networks. 

Example 4. Consider the network topology depicted in Figure [9] This is a tree network with 
four sources, two intermediate nodes and a receiver. Suppose source random variables are 
independent, with equal probability to be zero or one. In other words, Xi = {0,1} for i = 
1,2,3,4. Suppose the receiver wants to compute a parity check function f(X 1 ,X 2 ,X 3 ,X±) = 



in this figure. One may notice that we use eight colors to color this graph. We have, 
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Fig. 8. G 2 Xl , the second power graph of Gx x , described in Example [3] Letters ai,...,a% written over graph vertices indicate 
different colors. Two subsets of vertices are fully connected if each vertex of one set is connected to every vertex in the other 
set. 

(Xi + X 2 + X 3 + X 4 ) mod 2. Intermediate nodes are allowed to perform computation. 

In Example H first notice that characteristic graphs of source random variables are complete 
graphs. Hence, coloring random variables of sources are equal to source random variables. If 
intermediate nodes act like relays (i.e., no computations are performed at intermediate nodes), 
the following set of rates is an achievable scheme: 

R 2j > 1 for 1 < j < 4 

R Xj > 2 for 1 < j < 2 (2) 

where Rij are rates of different links depicted in Figure [9j 

However, suppose intermediate nodes perform some computations. Assume source nodes send 
their coloring random variables satisfying C.C.C. (in this case, they are equal to source random 
variables because characteristic graphs are complete). Then, each intermediate node makes its 
own characteristic graph and by using the received colors, picks a corresponding color for its 
own characteristic graph and sends that color. The receiver, by using the received colors of 
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Fig. 9. An example of a two stage tree network. 



intermediate nodes' characteristic graphs and a look-up table, can compute its desired function. 
Figure [TO] demonstrates this encoding/decoding scheme. For this example, intermediate nodes 
need to transmit one bit. Therefore, the following set of rates is achievable: 

Rii > 1. (3) 

for different possible i and j. Note that, in Example SI by allowing intermediate nodes to 
compute, we can reduce transmission rates of some links. This problem is investigated in Section 
|n]for a tree network where optimal computation to be performed at intermediate nodes is derived. 
We also show that for a family of functions and source random variables, intermediate nodes 
do not need to perform computation and acting like relays is an optimal operation for them. 

The problem of having different desired functions at the receiver with the side information is 
considered in Section HHl For this problem, instead of a characteristic graph, we compute a new 
graph, called a multi-functional characteristic graph. This graph is basically an OR function of 
individual characteristic graphs with respect to different functions. In this section, we find a rate 
region and propose an achievable coding scheme for this problem. 

The effect of feedback on the rate-region of functional compression problem is investigated in 
Section [IV] If the function at the receiver is the identity function, this problem is the Slepian-Wolf 
compression with feedback. For this case, having feedback does not give us any gain in terms 
of the rate. For example, reference [22J considers both zero-error and asymptotically zero-error 
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Fig. 10. Characteristic graphs and a decoding look-up table for Example [4] 



Slepian-Wolf compression with feedback. However, it is not the case when we have a general 
function / at the receiver. By having feedback, one may outperform rate bounds of the case 
without feedback. 

We consider the problem of distributed functional compression with distortion in Section 
IVl The objective is to compress correlated discrete sources so that an arbitrary deterministic 
function of those sources can be computed at the receiver within a distortion level. For this case, 
we compute a rate-distortion region and propose an achievable coding scheme. 

In our proposed coding schemes for different functional compression problems, one needs to 
compute the minimum entropy coloring (a coloring random variable which minimizes entropy) 
of a characteristic graph. In general, finding this coloring is an NP-hard problem ( lfT5l0 . However, 
in Section |VT1 we show that, depending on the characteristic graph's structure, there are some 
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interesting cases where finding a minimum entropy coloring is not NP-hard, but tractable and 
practical. Conclusions and future work are presented in Section IVIIl 

II. Functional Compression Over Tree Networks 

In this section, we consider the problem of functional compression for an arbitrary tree 
network. Suppose we have k possibly correlated source processes in a tree network, and a 
receiver at its root wishes to compute a deterministic function of these processes. Other nodes 
of this tree (called intermediate nodes) are allowed to perform computation to satisfy the node's 
demand. For this problem, we find a rate region (i.e., feasible rates for different links) of this 
network when sources are independent and a rate lower bound when sources are correlated. 

The rate region of functional compression problem has been an open problem. However, it 
has been solved for some simple networks under some special conditions. For instance, 
considered a rate region of a network with two transmitters and a receiver under a condition on 
source random variables. Here, we derive a rate lower bound for an arbitrary tree network based 
on the graph entropy. We introduce a new condition on colorings of source random variables' 
characteristic graphs called the coloring connectivity condition (C.C.C.). We show that unlike 
the condition used in 0, this condition is necessary and sufficient for any achievable coding 
scheme. We also show that, unlike entropy, graph entropy does not satisfy the chain rule. For 
one stage trees with correlated sources, and general trees with independent sources, we propose 
a modularized coding scheme based on graph colorings to perform arbitrarily closely to this rate 
lower bound. We show that in a general tree network case with independent sources, to achieve 
the rate lower bound, intermediate nodes should perform computation. However, for a family 
of functions and random variables, which we call chain-rule proper sets, it is sufficient to have 
intermediate nodes act like relays to perform arbitrarily closely to the rate lower bound. 

In this section, after giving the problem statement and reviewing previous results, we explain 
our main contributions in this problem. 

A. Problem Setup 

Consider k discrete memoryless random processes, {Xf}^, as source processes. 

Memorylessness is not necessary, and one can approximate a source by a memoryless one with an 
arbitrary precision 11231 . Suppose these sources are drawn from finite sets X\ = {x\, x\, x^}, 
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X k = {x\,xl } xj^'}. These sources have a joint probability distribution p(xi, ...,x k ). We 
express n-sequences of these random variables as X x = {Xl} t j z l l +n ~ 1 ,..., X k = {X l k Y j z l l +n ~ 1 
with the joint probability distribution p(x 1; x fe ). Without loss of generality, we assume I = 1, 
and to simplify notation, n will be implied by the context if no confusion arises. We refer to 
the i th element of x,,- as Xji. We use xj, x|,„. as different n- sequences of Xj. We shall omit the 
superscript when no confusion arises. Since the sequence (x 1; x fe ) is drawn i.i.d. according 
to p(xi, ...,x h ), one can write p(xi, ...,x fc ) = ]Xi=\P{ x Ui -, x ki)- 

Consider a tree network shown in Figure [3] Suppose we have k source nodes in this network 
and a receiver in its root. We refer to other nodes of this tree as intermediate nodes. Source 
node j has an input random process {Xj}?^. The receiver wishes to compute a deterministic 
function / : X x x ... x X k — >■ Z, or / : X™ x ... x X% — > Z n , its vector extension. 

Note that sources can be at any nodes of the network. However, without loss of generality, we 
can modify the network by adding some fake leaves to source nodes which are not located in 
leaves of the network. So, in the achieved network, sources are located in leaves (as an example, 
look at Figure [TT)) . 

Also, by adding some auxiliary nodes, one can make sources to be in the same distance from 
the receiver. Hence, we consider source nodes to be in distance d max from the receiver. Consider 
nodes of a tree with distance i > 1 from the receiver. We refer to them as the stage i of this 
tree. Let Wi be the number of such nodes. We refer to the j th node of the i th stage as n^. Its 
outgoing link is denoted by e^ . Suppose this node sends My over this edge with a rate (it 
maps length n blocks of M^-, referred to as Mjj, to {1,2, ...,2"^}.). 

If this node is a source node (i.e., nd max j for some j), then My = enx-(X 3 -), where enx, is 
the encoding function of the source j. 

Now, suppose this node is an intermediate node (i.e., n^-, i ^ {l,d max }) with incoming 
edges e(j + i)i, ...,and e(j + i) ? . We allow this node to compute a function (say gij(-)). Hence, 

My = #y(M( i+1 )l, M( i+ 1) ? ). 

The receiver has a decoder r which maps r : ]^[ 1<j<uji {1 , 2 nRl1 } — > Z". Thus, the receiver 
computes r(Mu,...,Mi m ) = r'(en Xl (Xi ),..., en Xk (X- k )). We refer to this encoding/decoding 
scheme as an n-distributed functional code. Intermediate nodes are allowed to compute functions, 
but have no demand of their own. The desired function /(Xi, X fe ) at the receiver is the only 
demand in the network. For any encoding/decoding scheme, the probability of error is defined 



16 





(a) 



(b) 



Fig. 11. a) Sources are not necessarily located in leaves b) By adding some fake nodes, one can assume sources are in leaves 
of the modified tree. 



as 



p e = Pr[(x-i, ...,xjfc) : /(xi, ...,Xfc) ^ r'(en Xl (xi), en Xfc (x fc ))]. 



(4) 



A rate tuple of the network is the set of rates of its edges (i.e., {Rij} for valid % and j). We say 
a rate tuple is achievable iff there exist a coding scheme operating at these rates so that P" — > 
as n — > oo. The achievable rate region is the set closure of the set of all achievable rates. 



B. Definitions and Prior Results 

In this part, first we present some definitions used in formulating our results. We also review 
some prior results. 

Definition 5. The characteristic graph Gxt = (Vx^Exj) of Xi with respect to X 2 , p(xi,x 2 ), 
and function f(X 1 ,X 2 ) is defined as follows: Vx 1 = X\ and an edge (x\,xj) G X± is in E Xl 
iff there exists a x\ e X 2 such that p(x\ ) x\)p(x\ 1 x\) > and f(x\,xl) ^ f(x\,x\). 

In other words, in order to avoid confusion about the function f(Xi,X 2 ) at the receiver, if 
(x\,xl) E E Xl , descriptions of x \ and x\ must be different. Shannon first defined this when 
studying the zero error capacity of noisy channels Q. Witsenhausen [|24l| used this concept to 
study a simplified version of our problem where one encodes X\ to compute f(Xi) with zero 
distortion. The characteristic graph of X 2 with respect to X\, p(xx,x 2 ), and f(Xi, X 2 ) is defined 
analogously and denoted by Gx 2 - O ne can extend the definition of the characteristic graph to 
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the case of having more than two random variables. Suppose Xi, X k are k random variables 
defined in Section IV-AI 

Definition 6. The characteristic graph Gx 1 = (Vx- 13 E Xl ) of X x with respect to random variables 
X 2 ,...,X k , p(xi, x^, and f(Xi, X k ) is defined as follows: Vx 1 = X\ and an edge (x\, x\) E 
Xi is in E Xl if there exist xj E Xj for 2 < j < k such that p(x\, x\, x\)p(x\, x\, x\) > 
and f(x\, x\, x\) 7^ f(xf, x 2 , x^). 

Example 7. To illustrate the idea of confusability and the characteristic graph, consider two 
random variables X\ and X 2 such that X\ = {0,1,2,3} and X 2 = {0,1} where they are 
uniformly and independently distributed on their own supports. Suppose f{X\, X 2 ) = (Xi + X 2 ) 
mod 2 is to perfectly reconstructed at the receiver. Then, the characteristic graph of X\ with 
respect to X 2 , p(xi,x 2 ) = |, and f is shown in Figure® a. 

The following definition can be found in ETTl . 

Definition 8. Given a graph Gx 1 = (Vx^Ex-J and a distribution on its vertices Vx x , graph 
entropy is 

H Gxi (X 1 )= min I(X 1 ;W 1 ), (5) 
where T(Gx 1 ) is the set of all maximal independent sets of Gx t - 

The notation X\ E W\ E T(Gx 1 ) means that we are minimizing over all distributions p(wi, x\) 
such that p(wi,x 1 ) > implies x 1 E Wi, where wi is a maximal independent set of the graph 

G Xl . 

Example 9. Consider the scenario described in Example For the characteristic graph of X\ 
shown in Figure^a, the set of maximal independent sets is W\ = {{0, 2}, {1, 3}}. To minimize 
I(Xi;Wi) = iy(Xi) - HiX^Wi) = log(4) - H{X 1 \W 1 ), one should maximize HiX^Wx). 
Because of the symmetry of the problem, to maximize H(Xi\Wi), p{w\) must be uniform over 
two possible maximal independent sets of Gx x - Since each maximal independent set Wi E W\ has 
two X\ values, thus, H(X\\wi) = log(2) bit, and since p(wi) is uniform, H(Xi\Wi) = log(2) 
bit. Therefore, Hq x {Xi) = log(4) — log(2) = 1 bit. One can see if we want to encode X\ 
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ignoring the effect of the function f, we need H(Xi) = log (4) = 2 bits. We will show that, for 
this example, functional compression saves us 1 bit in every 2 bits compared to the traditional 
data compression. 

Witsenhausen [|24l showed that the graph entropy is the minimum rate at which a single source 
can be encoded so that a function of that source can be computed with zero distortion. Orlitsky 
and Roche [2] defined an extension of Korner's graph entropy, the conditional graph entropy. 

Definition 10. The conditional graph entropy is 

H Gx (X l \X 2 ) = min I(W i; X^). (6) 

1 XieWier(G Xl ) 

Wi-Xi-Xi 

Notation Wi—Xi—X 2 indicates a Markov chain. If Xi and X 2 are independent, H Gxi {X 1 \X 2 ) — 
H Gxi (Xi). To illustrate this concept, let us consider an example borrowed from [0. 

Example 11. When f(X u X 2 ) = X lt Hg^X^XJ = H(X l \X 2 ). 

To show this, consider the characteristic graph of X\, denoted as G Xl - Since f(X\, X 2 ) = Xi, 
then for every x\ G X 2 , the set {x\ : p(x\,x^) > 0} of possible x\ are connected to each other 
(i.e., this set is a clique of G Xl ). Since the intersection of a clique and a maximal independent 
set is a singleton, X 2 and the maximal independent set W\ containing X\ determine X\. So, 

H G {X X \X 2 ) = min I(W 1 ; X X \X 2 ) 
1 XiGWier(G Xl ) 
Wi-Xi-x 2 

= HiXAX^- max H(X X \W 1 ,X 2 ) (7) 

XieWiGr(G Xl ) 

= H{X X \X 2 ). 

Definition 12. A vertex coloring of a graph is a function c Gxi (Xi) : V Xl — > N of a graph Gx 1 = 
(Vx 1 ,E Xl ) such that (x\,xl) G E Xl implies c Gx (x\) 7^ c Gxi (xf). The entropy of a coloring 
is the entropy of the induced distribution on colors. Here, p(c Gxi (x\)) = v{ c Gx ( c g Xi ( x i)))' 
where c Gx (cq Xi = {x{ : c Gxi (x{) = c Gxi (x\)} for all valid j. This subset of vertices with 
the same color is called a color class. We refer to a coloring which minimizes the entropy as a 
minimum entropy coloring. We use C Gx as the set of all valid colorings of a graph G Xl - 
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Example 13. Consider again the random variable X% described in Example [7] whose char- 
acteristic graph Gx, is shown in Figure A valid coloring for Gx, is shown in Figure 
\5^a. One can see that, in this coloring, two connected vertices are assigned to different col- 
ors. Specifically, cq x (Xi) = {r, b}. So, p(cg Xi (x\) = r) = p(x\ = 0) + p(x\ = 2), and 
p(c Gxi (x\) = b) = p(x{ = 1) + p(x\ = 3). 

We define a power graph of a characteristic graph as follows: 

Definition 14. The n-th power of a graph G x , is a graph G\_ L = (V^, E Xi ) such that V Xl = X\ 
and (xj;,xf) G E\ when there exists at least one % such that (x\^x\^) G E Xl - We denote a 
valid coloring of G\ x by cg» (Xi). 

One may ignore atypical sequences in a sufficiently large power graph of a conflict graph and 
then, color that graph. This coloring is called an e-coloring of a graph and is defined as follows: 

Definition 15. Given a non-empty set A C X\ x X 2 , define p(xi,x 2 ) = p(xi, x 2 ) /p(A) when 
(xi,x 2 ) G A, and p(x,y) = otherwise, p is the distribution over (xi,x 2 ) conditioned on 
(xi, x 2 ) G A. Denote the characteristic graph of Xi with respect to X 2 , p(xi,x 2 ), and f(Xi,X 2 ) 
as Gx, = (Vxij-ExJ and the characteristic graph of X 2 with respect to X\, p(xi,x 2 ), and 
f(Xi, X 2 ) as Gx 2 = {Yx 2 i Ex 2 )- Note that E x , C E Xl and E X2 C E X2 - Suppose p(A) > 1 — e. 
We say that cg Xi (Xi) and cg X2 (X 2 ) are e-colorings of Gx, and G X2 if they are valid colorings 
of G x , and G X2 - 



In Q25J] , the Chromatic entropy of a graph Gx, is defined as 
Definition 16. 

H G (X0= . min Hica^iX,)). 

i cq „ is an e-coloring of G Yi 

The chromatic entropy is a representation of the chromatic number of high probability sub- 
graphs of the characteristic graph. In [3], the conditional chromatic entropy is defined as 

Definition 17. 

H G (X 1 \X 2 )= mm. H(c Gxi (Xx)\X 2 ). 

i cg x an e-coloring of Gx, 
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Regardless of e, the above optimizations are minima, rather than infima, because there are 
finitely many subgraphs of any fixed graph Gx r , and therefore there are only finitely many 
e-colorings, regardless of e. 

In general, these optimizations are NP-hard ([15]). But, depending on the desired function /, 
there are some interesting cases that they are not NP-hard. We discuss these cases in Section 
ED 

Korner showed in [|2T| that, in the limit of large n, there is a relation between the chromatic 
entropy and the graph entropy. 

Theorem 18. 

lim -H Qn (X 1 ) = H Gx (X 1 ). (8) 

This theorem implies that the receiver can asymptotically compute a deterministic function of 
a discrete memoryless source, by first coloring a sufficiently large power of the characteristic 
graph of the source random variable with respect to the function, and then, encoding achieved 
colors using any encoding scheme which achieves the entropy bound of the coloring RV. In 
the previous approach, to achieve the encoding rate close to graph entropy of X\, one should 
find the optimal distribution over the set of maximal independent sets of Gx 1 ■ But, this theorem 
allows us to find the optimal coloring of G\ v instead of the optimal distribution on maximal 
independent sets. One can see that this approach modularizes the encoding scheme into two 
parts, a graph coloring module, followed by a Slepian-Wolf compression module. 

The conditional version of the above theorem is proven in J3]|. 

Theorem 19. 




This theorem implies a practical encoding scheme for the problem of functional compression 
with side information where the receiver wishes to compute /(X l5 X 2 ), when X 2 is available 
at the receiver as the side information. Orlitsky and Roche showed in [|2l that Hq x (Xi\X 2 ) 
is the minimum achievable rate for this problem. Their proof uses random coding arguments 
and shows the existence of an optimal coding scheme. This theorem presents a modularized 
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encoding scheme where one first finds the minimum entropy coloring of G\ for large enough 
n, and then uses a compression scheme on the coloring random variable (such as Slepian-Wolf 
01) to achieve a rate arbitrarily close to H(cg^ (X 1 )|X 2 ). This encoding scheme guarantees 
computation of the function at the receiver with a vanishing probability of error. 

All these results considered only functional compression with side information at the receiver 
(Figure ED-a). Consider the network shown in Figure [T]-b. It shows a network with two source 
nodes and a receiver which wishes to compute a function of the sources' values. In general, 
the rate-region of this network has not been determined. However, [3J determined a rate-region 
of this network when source random variables satisfy a condition called the zigzag condition, 
defined below. 

We refer to the e-joint-typical set of sequences of random variables X 1; X fc as T™. k is 
implied in this notation for simplicity. T e n can be considered as a strong or weak typical set 

Definition 20. A discrete memoryless source {(X[, X^)}^^ with a distribution p(xi,X2) satisfies 
the zigzag condition if for any e and some n, (x}, Xj), (x^, Xg) G T™, there exists some (xf , x|) G 
T™ such that (x^, x 2 ), (x^, x^) G T? for each i G {1,2}, and (pc\p x%j) = ( x \j, x 2j l ) f or some 
i G {1,2} for each j. 

In fact, the zigzag condition forces many source sequences to be typical. We first explain the 
results of 01 . Then, in Section Hl-CL we compute a rate-region without the need for the zigzag 
condition. Then, we extend our results to the case of having k source nodes. 

Reference [3] shows that, if the source random variables satisfy the zigzag condition, an 
achievable rate region for this network is the set of all rates that can be achieved through graph 
colorings. The zigzag condition is a restrictive condition which does not depend on the desired 
function at the receiver. This condition is not necessary, but sufficient. 

C. A Rate Region for One-Stage Tree Networks 

In this section, we want to find a rate region for a general one stage tree network without 
having any restrictive conditions such as the zigzag condition. Consider the network shown in 
Figure [2] with k sources. 
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Definition 21. A path with length m between two points Z x = (x\, x\, x^), and Z m = 
(x\, x\, xfy is determined by m — 1 points Z it 1 < i < m such that, 

i) P(Zi) > 0, for alll<i<m. 

ii) Zi and Z i+ i only differ in one of their coordinates. 

Definition [2T] can be generalized to two n-length vectors as follows. 

Definition 22. A path with length m between two points Z\ = (xj, x^, x^) 6 T™, and 

Z m = (xj, x|, x|) G T™ are determined by m — 1 points Z i} 1 < i < m such that, 

i) Zi e T e n , /or alll <i<m. 

ii) Zi and Zj + i on/y <iz/fer m one o/ f/?e/r coordinates. 

Note that, each coordinate of Zj is a vector with length 

Definition 23. A joint-coloring family J G for random variables X\, Xk with characteristic 
graphs Gx 1 ,---,Gx k , and any valid colorings CG Xi ,-.-,cc Xk , respectively is defined as J G = 
{jl, j c 3c } where f c is the collection of points (a^ 1 , x 1 ^, x l £) whose coordinates have the 
same color (i.e., H = {(x\\x%, ...,x l £), (x^x'g, ...,x l £) : c Gxi (x 1 ^) = c Gxi (x 1 ^) , c Gxk (x l k k ) = 
c Gx fc ( x fc )}> f or an y va -Hd i\,...ik, and li,...l k ). Each f c is called a joint coloring class where rij c 
is the number of joint coloring classes of a joint coloring family. 

We say a joint coloring class f c is connected if between any two points in j l c , there exists a 
path that lies in j*. Otherwise, it is disconnected. Definition [23] can be expressed for random 
vectors X lv ..,X fc with characteristic graphs G\ ,...,G\ , and any valid e-colorings c G « ,...,c G « , 

1 fc " x. 1 x fc 

respectively. 

Definition 24. Consider random variables X\, Xj, with characteristic graphs Gx ± , Gx k , 
and any valid colorings c Gxi , c G Xk - We say a joint coloring class j\ G J G satisfies the 
Coloring Connectivity Condition (CCC) when it is connected or disconnected parts of j l c have 
the same function value. We say colorings c Gxi , c GXh satisfy CCC when all joint coloring 
classes satisfy CCC 

CCC. can be expressed for random vectors Xi, Xfc with characteristic graphs G\ x , 
Gy , and any valid e-colorings c G ™ , c G « , respectively. 

1 T-, 
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f(x 1 ,x 2 ) \(x h x 2 ) 

A A 




(a) (b) 



Fig. 12. Two examples of a joint coloring class: a) satisfying C.C.C. b) not satisfying C.C.C. Dark squares indicate points 
with zero probability. Function values are depicted in the picture. 

Example 25. For example, suppose we have two random variables X\ and X 2 with characteristic 
graphs and Gx 2 - Let us assume cq Xi and cg X2 are two valid colorings of Gx^ and Gx 2 > 
respectively. Assume cg Xi (x\) = cq x (xf) and cg X2 (x\) = cg X2 (x 2 ). Suppose j\ represents this 
joint coloring class. In other words, j\ = {{x\,x 2 )}, f or a ^ 1 — hJ 1 — 2 when p(x\,x 3 2 ) > 0. 
Figure \L2\ considers two different cases. The first case is when p(x\,x%) = 0, and other points 
have a non-zero probability. It is illustrated in Figure U2\-a. One can see that there exists a 
path between any two points in this joint coloring class. So, this joint coloring class satisfies 
C.C.C. If other joint coloring classes of cq Xi and cg X2 satisfy C.C.C, we say cg Xi and cg X2 
satisfy C.C.C. Now, consider the second case depicted in Figure \T2\b. In this case, we have 
p(x\,xl) = 0, p(x\,x\) = 0, and other points have a non-zero probability. One can see that 
there is no path between (x\,x\) and (x\ 1 x1) in j\. So, though these two points belong to a 
same joint coloring class, their corresponding function values can be different from each other. 
Thus, j\ does not satisfy C.C.C. for this example. Therefore, cg Xi and cg X2 do not satisfy C.C.C. 

Lemma 26. Consider two random variables X\ and X 2 with characteristic graphs Gx ± and 
Gx 2 and any valid colorings cq x (Xl) and cg X2 (X 2 ) respectively, where cg X2 (X 2 ) is a triv- 
ial coloring, assigning different colors to different vertices (to simplify the notation, we use 
cgx 2 (X 2 ) = X 2 to refer to this coloring). These colorings satisfy C.C.C. Also, ca™ (Xx) and 
cgj ( x 2) = X 2 satisfy C.C.C, for any n. 
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Proof: First, we know that any random variable X 2 by itself is a trivial coloring of Gx 2 such 
that each vertex of G X2 is assigned to a different color. So, J G for c Gxi (X^ and c Gxa (X 2 ) = X 2 
can be written as J G = {jl, ...,jc JC } such that j\ = {{x\,x\) : c Gxi (x\) = c^}, where a { is a 
generic color. Any two points in jl are connected to each other with a path with length one. 
So, jl satisfies C.C.C. This arguments hold for any j l c for any valid i. Thus, all joint coloring 
classes and therefore, c Gxi (X 1 ) and c Gx2 (X 2 ) = X 2 satisfy C.C.C. The argument for c G ™ (X^ 
and cg« (X 2 ) = X 2 is similar. ■ 

x 2 

Lemma 27. Consider random variables X\, X k with characteristic graphs Gx lt Gx k , 
and any valid colorings c Gxi , c GXfc with joint coloring class J G = {jl : i}. For any two 
points (x\,...,x k ) and (x 2 ,...,x k ) in j l c , f(x\,...,x k ) = f{x\, x|) if and only if j l c satisfies 
C.C.C. 

Proof: We first show that if j l c satisfies C.C.C, then, for any two points (x\,...,x\) and 
(xf,...,x 2 k ) in j l c , f(x\,...,xl) = f{xj,...,x 2 k ) . Since j* satisfies C.C.C, either f(x\, ...,xl) = 
f{x\, x 2 k ), or there exists a path with length m — 1 between these two points Z\ = (xj, x\) 
and Z m = (x\ , x k ), for some m. Two consecutive points Zj and Zj + i in this path, differ 
in just one of their coordinates. Without loss of generality, suppose they differ in their first 
coordinate. In other words, suppose Zj = (xf , x 2 2 ..., x J k k ) and Z j+1 = (x{° , x 2 2 ..., x J k k ). Since 
these two points belong to j l c , c Gxi (x{ 1 ) = c Gxi (x{°). If f[Zf) ^ f(Zj + i), there would exist an 
edge between x 3 ^ and xf in Gx 1 and they could not have the same color. So, f[Zf) = f(Zj + i). 
By applying the same argument inductively for all two consecutive points in the path between 
Zi and Z m , one can get f{Z x ) = f(Z 2 ) = ... = f(Z m ). 

If jl does not satisfy C.C.C, it means that there exists at least two points Z\ and Z 2 in j l c 
such that no path exists between them with different function values. As an example, consider 
Figure \Y2\b. Hence, the function value is the same in a joint coloring class iff it satisfies C.C.C. 

■ 

Lemma 28. Consider random variables X 1; X& with characteristic graphs G\ , G\ k , 
and any valid e-colorings c G n , c G " with the joint coloring class J G = {j l : i}. For any two 
points ...,yz\) and (x^, xj?) in j l c , /(xj;, x^) = /(x^,...,x^) if and only if j % c satisfies 
C.C.C. 
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Proof: The proof is similar to Lemma [27J The only difference is to use the definition 
of C.C.C. for c G n , c G n . Since f c satisfies C.C.C, either /(x{, x£) = /(x^, x£), or 
there exists a path with length m — 1 between any two points Zj = (xj, XjJ.) G T e n and 
Z m = (x x , x|) G T™ in j*, for some m. Consider two consecutive points Zj and Z J+1 in this 
path. They differ in one of their coordinates (suppose they differ in their first coordinate). In 
other words, suppose Zj = (x-f, x^ 2 ..., x^ fc ) G T™ and Zj +i = (x-f , x^ 2 ..., x^) G T™. Since these 
two points belong to f c , cg x (xf) = c Gxi (x.{°). If /(Zj) ^ /(Z 3 - +1 ), there would exist an edge 
between xj 1 and xj° in G'x 1 and they could not get the same color. Thus, f(Zj) = f(Zj +1 ). By 
applying the same argument for all two consecutive points in the path between Z\ and Z m , one 
can get f(Z\) = f(Z 2 ) = ... = f(Z m ). The converse part is similar to Lemma l27l ■ 
Next, we want to show that, if X 1 and X 2 satisfy the zigzag condition given in Definition 
[20l any valid colorings of their characteristic graphs satisfy C.C.C, but not vice versa. In other 
words, we want to show that the zigzag condition used in (31 is sufficient but not necessary. 

Lemma 29. If two random variables X\ and X 2 with characteristic graphs Gx 1 and Gx 2 satisfy 
the zigzag condition, any valid colorings cq Xi and cq X2 of Gx 1 and Gx 2 satisfy C.C.C, but not 
vice versa. 

Proof: Suppose Xi and X 2 satisfy the zigzag condition, and c Gxi and c Gx2 are two valid 
colorings of Gx 1 and Gx 2 , respectively. We want to show that these colorings satisfy C.C.C. 
To do this, consider two points (x\,xl,) and (x^x^) in a joint coloring class j l c . The definition 
of the zigzag condition guarantees the existence of a path with length two between these two 
point. Thus, c Gxi and c Gx2 satisfy C.C.C. 

The second part of this Lemma says that the converse part is not true. To have an example, 
one can see that in a special case considered in Lemma l26l those colorings always satisfy C.C.C. 
without having any condition such as the zigzag condition. ■ 

Definition 30. For random variables X\, Xk with characteristic graphs Gx v Gx k , the 
joint graph entropy is defined as follows: 

H GxitmtG (X!,...,X k ) 4 lim min -H(c G n (X^,..,^ (X,)) (10) 

1 fc n->ooe G n ,...,c G n n 1 k 

x l x fc 

in which c G ™ (X x ), c G « (X fc ) are e-colorings of Gv , G\ satisfying C.C.C. We refer 
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to this joint graph entropy as H[a x .] ieS where S = {1,2,..., A:}. Note that, this limit exists 
because we have a monotonically decreasing sequence bounded below. Similarly, we can define 
the conditional graph entropy. 

Definition 31. For random variables X\, X k with characteristic graphs Gx v Gx k , the 
conditional graph entropy can be defined as follows: 

Hg Xi ,...,g x .{Xi, Xi\X i+ i, ...,X k ) 

= lim min -H(c G n (Xi ),..., c G n (Xi)\c G n (X m ), c G n (X fc )) (11) 

where the minimization is over c G n (XA c G « (XA which are e-colorings of G\ , G|, 
satisfying C.C.C. 

Lemma 32. For k = 2, Definitions [721 and \JJ} are the same. 
Proof: By using the data processing inequality, we have 

H G (Xi\X 2 ) = lim min -H(c G n (Xi)|c G n (X 2 )) 

x l x 2 

= lim min -H(c G n (Xi)|X 2 ). 

71— >QO Cqtl Tl X l 
X l 

Then, Lemma l26l implies that c G ™ (Xx) and c^™ (x 2 ) = X 2 satisfy C.C.C. A direct application 

Xj x 2 

of Theorem [19] completes the proof. ■ 
Note that, by this definition, the graph entropy does not satisfy the chain rule. 
Suppose S(k) denotes the power set of the set {1, 2, k} excluding the empty subset. Then, 

for any 5" G S(k), 

Xs — {Xi : i G S}. 

Let S c denote the complement of S in S(k). For S = {1, 2, k}, denote S c as the empty set. To 
simplify notation, we refer to a subset of sources by X s . For instance, 5(2) = {{1}, {2}, {1, 2}}, 
and for S = {1, 2}, we write H [GXi]ieS (X s ) instead of H Gx ^ Gx2 (X 1 ,X 2 ). 
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Theorem 33. A rate region of the network shown in Figure [2] is determined by these conditions: 

\!S G S(k) =► £fl u > H [GxMs (X s \X S c). (12) 
ies 

Proof: We first show the achievability of this rate region. We also propose a modularized en- 
coding/decoding scheme in this part. Then, for the converse, we show that no encoding/decoding 
scheme can outperform this rate region. 
X) Achievability: 

Lemma 34. Consider random, variables Xj, with characteristic graphs G\ , G\ , 

and any valid e-colorings Cc n , c G « satisfying C.C.C., for sufficiently large n. There exists 

x i x fc 

/ : c^ x (Ai) x ... x c Glk (X k ) -> Z n (13) 
such that f( c G^ (xi), c G ^ ( Xfe )) = f{x u x k ), for all (xi, x fe ) G T e n . 

Proof: Suppose the joint coloring family for these colorings is J c = {j l c : i}. We proceed 
by constructing /. Assume (xj,...,x^,) G f c and c G n (x\) = a 1 , c G ™ (x£) = a k . Define 
f(a u ...a k ) = f(x\,...,xl). 

To show this function is well-defined on elements in its support, we should show that for any 
two points (xj 9 ...,xl) and (x?, ...,x|) in T™, if ccn^xj) = c^Jx?), c G ^( x i) = c G ^( x 2), 
then f(x\,...,xl) = /(x?,...,x|). 

Since c G ™ (xj) = c G « (x^), c G ™ (x£) = c G ™ (x|), these two points belong to a joint 

X^ X^ 

coloring class such as f c . Since c G « , c G ™ satisfy C.C.C., by using Lemma l28l f(x\, x£) = 
/(xf , x|). Therefore, our function / is well-defined and has the desired property. ■ 
Lemma l34l implies that, given e-colorings of characteristic graphs of random variables satisfy- 
ing C.C.C. at the receiver, we can successfully compute the desired function / with a vanishing 
probability of error as n goes to infinity. Thus, if the decoder at the receiver is given colors, 
it can look up / based on its table of /. The question remains of at which rates encoders can 
transmit these colors to the receiver faithfully (with a probability of error less than e). 

Lemma 35. (Slepian-Wolf Theorem) 
A rate-region of the network shown in Figure \2\ where f(Xi,...,X k ) = (Xi, X k ) can be 
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determined by these conditions: 



WS e S{k) => Y, R n ^ H ( x s\X S c)- 



(14) 



Proof: See @. 



We now use the Slepian-Wolf (SW) encoding/decoding scheme on achieved coloring random 



total error in the decoding of colorings at the receiver is less than e. Therefore, the total error 
in the coding scheme of first coloring G^, G\ , and then encoding those colors by using 
SW encoding/decoding scheme is upper bounded by the sum of errors in each stage. By using 
Lemmas [34] and [35] the total error is less than e, and goes to zero as n goes to infinity. By 
applying Lemma [35] on achieved coloring random variables, we have, 



where cg™ , and cg™ are e-colorings of characteristic graphs satisfying C.C.C. Thus, using 
Definition [31] completes the achievability part. 

As an example, look at Figure [Q-C. This network has two source nodes and a receiver. Source 
nodes compute e-colorings of their characteristic graphs. These colorings should satisfy C.C.C. 
Then, an SW compression is performed on these colorings. The receiver, first, perform SW 
decoding to get the colors. Then, by using a look-up table, it can find the value of its desired 
function (As an example, look at Figure [5]). 

2) Converse: Here, we show that any distributed functional source coding scheme with a small 
probability of error induces e-colorings on characteristic graphs of random variables satisfying 
C.C.C. Suppose e > 0. Define J 7 ™ for all (n, e) as follows, 



In other words, J 7 ™ is the set of all functions equal to / with e probability of error. For large 
enough n, all achievable functional source codes are in J 7 ™. We call these codes e-achievable 
functional codes. 



variables. Suppose the probability of error in each decoder of SW is less than |. Then, the 




(15) 



J 7 : = {/ : Pr[f(X u X k ) ^ /(Xx, X*)] < e). 



(16) 



Lemma 36. Consider some function f : X\ x 



x Xk — > Z. Any distributed functional code 
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which reconstructs this function with zero error probability induces colorings on G Xl ,...,G Xk 
with respect to this function, where these colorings satisfy C.C.C. 

Proof: To show this lemma, let us assume we have a zero-error distributed functional code 
represented by encoders en Xl , ■ en Xk and a decoder r. Since it is error free, for any two 
points (x{, ...,xl) and (x 2 , ...,xl), if p(x\, ...,xl) > 0, p(x\, ...,x k ) > 0, en Xl (x{) = en Xl (xf), 
■••> en x k {x\) = en Xk (x 2 k ), then, 

f(x{,...,xl) = f(x 2 ,...,x 2 k ) = r'(en Xl (x\) } en Xk (xl)). (17) 

We want to show that en Xl , en Xk are some valid colorings of G Xl , G Xk satisfying C.C.C. 
We demonstrate this argument for X\. The argument for other random variables is analogous. 
First, we show that en Xl induces a valid coloring on G Xl , and then, we show that this coloring 
satisfies C.C.C. Let us proceed by contradiction. If en Xl did not induce a coloring on G Xl , 
there must be some edge in G Xl with both vertices with the same color. Let us call these 
vertices x\ and x\. Since these vertices are connected in G Xl , there must exist a (x\,...,x l k ) 
such that, p(x\, x\, x k )p(xf, x\, x k ) > 0, en Xl (x\) = en Xl (x\), and f(x\, x\, x k ) ^ 
f{x\,x\, ...,x k ). By taking x\ = x\, x\ = x\ in (fTTI) . one can see that it is not possible. So, 
the contradiction assumption is wrong and en Xl induces a valid coloring on G Xl . 

Now, we should show that these induced colorings satisfy C.C.C. If it was not true, it would 
mean that there must exist two point (x\, x\) and {x\, —,xl) in a joint coloring class j l c such 
that there is no path between them in f c . So, Lemma [271 says that the function / can get different 
values in these two points. In other words, it is possible to have f{x\,...,x] e ) ^ f(x 2 ,...,xl), 
where c Gx ^ (x\) = c Gxi (xf), c GXk (x\) = c GXk (x k ), which is in contradiction with (fTTI) . Thus, 
these colorings satisfy C.C.C. ■ 

In the last step, we should show that any achievable functional code represented by induces 
e-colorings on characteristic graphs satisfying C.C.C. 

Lemma 37. Consider random, variables Xi, All e- achievable functional codes of these 

random variables induce e-colorings on characteristic graphs satisfying C.C.C. 

Proof: Suppose g(xi, ...,x fc ) = r'(enxi(xi), en Xk (x k )) 6 J 7 ™ is such a code. Lemma l36l 
says that a zero-error reconstruction of g induces some colorings on characteristic graphs satisfy- 
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Fig. 13. (a) A simple tree topology, (b) A completed tree. 

ing C.C.C., with respect to g. Suppose the set of all points (xi, ...,x fc ) such that g(x 1; ...,x fc ) ^ 
/(xi,...,Xfc) be denoted by C. Since g G J 7 ™, Pr[C] < e. Therefore, functions enx x , ■■, enx k 
restricted to C are e-colorings of characteristic graphs satisfying C.C.C. (by definition). ■ 
Lemmas [36] and [37] establish the converse part and complete the proof. ■ 
If we have two transmitters (k = 2), Theorem [33] can be simplified as follows. 

Corollary 38. A rate region of the network shown in Figure U\b is determined by these three 
conditions: 

Ru > Hg Xi {Xi\X 2 ) 

R12 >H Gx2 {X 2 \X l ) (18) 
Rn +R12 > Ha Xl ,Gx 2 \X\, X 2 ). 

D. A Rate Lower Bound for a General Tree Network 

In this section, we compute a rate lower bound of an arbitrary tree network with k correlated 
sources at its leaves and a receiver in its root (see Figure [3]). We refer to other nodes of this 
tree as intermediate nodes. The receiver wishes to compute a deterministic function of source 
random variables. Intermediate nodes have no demand of their own, but they are allowed to 
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perform computation. Computing the desired function / at the receiver is the only demand of 
the network. For independent sources, we propose a modularized coding scheme to perform 
arbitrarily closely to derived rate lower bounds. 

First, we propose a framework to categorize any tree networks and their nodes. 

Definition 39. For an arbitrary tree network, 

• The distance of each node is the number of hops in the path between that node and the 
receiver. 

• d max is the distance of the farthest node from the receiver. 

• A complete tree is a tree such that all source nodes are in a distance d max from the receiver. 

• An auxiliary node is a new node connected to a leaf of a tree and increases the leaf's 
distance by one. The added link is called an auxiliary link. The leaf in the original tree to 
which is added an auxiliary node is called the actual node corresponding to that auxiliary 
node. The link in the original tree connected to the actual node is called the actual link 
corresponding to that auxiliary link. 

• For any given tree, one can complete it by adding some consecutive auxiliary nodes to its 
leaves whose distances are less than d max . The achieved tree is called the completed tree 
and this process is called tree completion. 

These concepts are depicted in Figure \\3\ Auxiliary nodes in the completed tree network act 
as intermediate nodes. Note that, all functions that may be computed in auxiliary nodes can 
be gathered in their corresponding actual node in the original tree. So, the rate of the actual 
link in the original tree network is the minimum of rates of corresponding auxiliary links in the 
completed tree. Thus, if we compute the rate-region for the completed tree of any given arbitrary 
tree, we can compute the rate-region of the original tree. Therefore, in the rest of this section, 
we consider the rate-region of completed tree networks. 

Definition 40. Consider a completed tree network with k source nodes in distance d max from 
the receiver. Nodes in distance i from the receiver are called the i th stage of the tree. W{ is the 
number of nodes in the i th stage. Hy is a subset of source random variables whose paths to the 
receiver have the last i common hops. is called source variables of node riy. A connection 
set of a completed tree is defined as St = {s\ : 1 < i < d max }, where s\ = {Hy : 1 < j < Wi}. 
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Note that, any completed tree can be expressed by a connection set. 

For example, consider the network shown in Figure [T3T-b. Its connection set is St = {sj, sf} 
such that sj = {(X 1 ,X 2 ),X 3 } and s 2 t = {X h X 2 ,X 3 }. In other words, E u = (X U X 2 ), E X2 = 
X 3 , E 21 = Xi, E 22 = X 2 and S 23 = X 3 . One can see that S T completely describes the structure 
of the tree. 

We have three types of nodes: source nodes, intermediate nodes and a receiver. Source nodes 
process their messages and transmit them. Intermediate nodes can compute some functions 
of their received information. The receiver processes the received information to compute its 
desired function. For example, consider the network shown in Figure \\3\b. Random variables 
sent through links e 21 , e 22 , e 23 , en and e 12 are M 21 , M 22 , M 23 , M n and M 12 such that 
Mn = g u (M 21 ,M 22 ), and M 12 = g 12 (M 23 ). 

1) A Rate Lower Bound: Consider node n,j of a tree. Let S(wi) be the power set of the set 
{1,2, ...,Wi} and Sj e S(wi) be anon-empty subset of {1, 2, ...,Wi}. 

Theorem 41. A rate lower bound of a tree network with the connection set St = {s\ : i} can 
be determined by these conditions, 



for all i = 1, I Sr| where E is . = [E^^. and E is c = {X t , X k }\{E is .}. 

Proof: In this part, we want to show that no coding scheme can outperform this rate region. 
Consider nodes in the i-th stage of this network, for 1 < j < W{. Suppose they are directly 
connected to the receiver. So, the information sent in links of this stage should be sufficient to 
compute the desired function. Suppose their parent nodes sent all their information without doing 
any compression. So, by direct application of Theorem [33l ( fT9l ) can be derived. This argument 
can be repeated for all stages. Thus, no coding scheme can outperform these bounds. ■ 
In the following, we express some cases under which we can achieve the derived rate lower 
bound of Theorem 0TJ 

2) Tightness of the Rate Lower Bound for Independent Sources: In this part, we propose a 
functional coding scheme to achieve the rate lower bound. Suppose random variables Xi, 
Xfc with characteristic graphs Gv,, Gy arQ independent. Assume cg™ , cg™ are valid 

*- lk Xi Xj, 




(19) 
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e-colorings of these characteristic graphs satisfying C.C.C. The proposed coding scheme can 
be described as follows: source nodes first compute colorings of high probability subgraphs of 
their characteristic graphs satisfying C.C.C, and then, perform source coding on these coloring 
random variables. Intermediate nodes first compute their parents' coloring random variables, and 
then, by using a look-up table, find corresponding source values of their received colorings. Then, 
they compute e-colorings of their own characteristic graphs. The corresponding source values 
of their received colorings form an independent set in the graph. If all are assigned to a single 
color in the minimum entropy coloring, intermediate nodes send this coloring random variable 
followed by a source coding. But, if vertices of this independent set are assigned to different 
colors, intermediate nodes send the coloring with the lowest entropy followed by source coding 
(Slepian-Wolf). The receiver first performs a minimum entropy decoding ([23]) on its received 
information and achieves coloring random variables. Then, it uses a look-up table to compute 
its desired function by using achieved colorings. 

To show the achievability, we show that, if nodes of each stage were directly connected to 
the receiver, the receiver could compute its desired function. Consider the node riy in the i-th 
stage of the network. Since the corresponding source values of its received colorings form an 
independent set on its characteristic graph (G^.) and this node computes the minimum entropy 
of this graph, it is equivalent to the case where it would receive the exact source information, 
because both of them lead to the same coloring random variable. So, if all nodes of stage i were 
directly connected to the receiver, the receiver could compute its desired function and link rates 
would satisfy the following conditions. 

Vs , e S( Wi ) =► J2 * H [GsiMH (E iSi ). (20) 

Thus, by using a simple induction argument, one can see that the proposed scheme is achiev- 
able and it can perform arbitrarily closely to the derived rate lower bound, while sources are 
independent. 

E. A Case When Intermediate Nodes Do not Need to Compute 

Though the proposed coding scheme in Section III-D2I can perform arbitrarily closely to the 
rate lower bound, it may require computation at intermediate nodes. 
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Definition 42. Suppose f{X\, ...,Xk) is a deterministic function of random variables Xx,...,X k . 
(f,Xi, ...,X k ) is called a chain-rule proper set when for any s G S(k), H[a x .] ieS = H Gxg (X s ). 

Theorem 43. In a general tree network, if sources Xi,...,X k are independent random variables 
and (f,Xi,..., X k ) is a chain-rule proper set, it is sufficient to have intermediate nodes as relays 
to perform arbitrarily closely to the rate lower bound mentioned in Theorem |47| 

Proof: Consider an intermediate node in the i-th stage of the network whose correspond- 
ing source random variables are X s where s G S(k) (i.e., X s = S^-)- Since random variables 
are independent, one can write rate bounds of Theorem ED as, 

V Si G S( Wi ) =► * H [GBizleH (E iSi ). (21) 

Now, consider the outgoing link rate of the node n^. If this intermediate node acts like a 
relay, we have Rij = % x ,] j£S (x,) (since X s = Sy). If (f,Xi, ...,X k ) is a chain-rule proper set, 
we can write, 

Rij = H [Gxi ] ies (x s ) 

= H G Xs i X s) 

= H^.XEij). (22) 

For any intermediate node where j G and G S(wj), we can write a similar argument 
which lead to conditions (I2TI) . As mentioned in Theorem EH to perform arbitrarily closely to the 
rate lower bound, this node needs to compress its received information up to the rate H Gxs (X s ). 
If this node acted as a relay and forwarded the received information from the previous stage, 
it would lead to an achievable rate H\j if _ s G x .{X s ) in the next stage, which in general is not 
equal to Hq x (X s ). So, this scheme cannot achieve the rate lower bound. However, if for any 
s G S(k), H\j i< _ sGx _(X s ) = H Gxs (X s ), this scheme can perform arbitrarily closely to the rate 
lower bound by having intermediate nodes as relays. ■ 

In the following Lemma, we provide a sufficient condition to guarantee that a set is a chain-rule 
proper set. 

Lemma 44. Suppose X\ and X 2 are independent and f(X 1 ,X 2 ) is a deterministic function. If 
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Fig. 14. An example of Gx x ,x 2 satisfying conditions of Lemma l44l when X 2 has two members. 

for any x\ and x\ in X 2 we have f(x\, x\) ^ f(x{, x 2 ) for any possible i and j, then, (/, X\ , X 2 ) 
is a chain-rule proper set. 

Proof: We show that under this condition any colorings of the graph Gx 1 ,x 2 can be expressed 
as colorings of G Xl and G X2 , and vice versa. The converse part is straightforward because any 
colorings of G Xl and Gx 2 can be viewed as a coloring of G Xl ,x 2 - 

Consider Figure [14] which illustrates conditions of this lemma. Under these conditions, since 
all X2 in X 2 have different function values, graph G Xl>x% can be decomposed to some subgraphs 
which have the same topology as G Xl , corresponding to each x 2 in X 2 . These subgraphs are 
fully connected to each other under conditions of Corollary HU Thus, any coloring of this graph 
can be represented as two colorings of G Xl and G Xi , which is a complete graph. Hence, the 
minimum entropy coloring of G XltX . 2 is equal to the minimum entropy coloring of (G Xl , G X2 ). 
Therefore, H Gxi>Gxa (X U X 2 ) = H Gxi<Xi (X U X 2 ). ■ 

III. Multi-Functional Compression with Side Information 

In this section, we consider the problem of multi-functional compression with side information. 
The problem is how we can compress a source X so that the receiver is able to compute some 
deterministic functions fi(X, Yi), f m (X,Y m ), where Yi, 1 < i < m, are available at the 
receiver as side information. 

Section |n] only considers the case where the receiver desires to compute one function (m=l). 
Here, we consider a case where computation of several functions with different side information 
is desired at the receiver. Our results do not depend on the fact that all desired functions are in 
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Fig. 15. A multi-functional compression problem with side information. 



one receiver and one can apply them to the case of having several receivers with different desired 
functions (i.e., functions are separable). We define a new concept named the multi-functional 
graph entropy which is an extension of the graph entropy defined by Korner in fl2T|. We show 
that the minimum achievable rate for this problem is equal to the conditional multi-functional 
graph entropy of the source random variable given side informations. We also propose a coding 
scheme based on graph colorings to achieve this rate. 

A. Problem Setup 

Consider discrete memoryless sources (i.e., {X l } c *L 1 and {Y k 1 }°l l ) and assume that these 
sources are drawn from finite sets X and y k with a joint distribution p k (x,y k ). We express 
n-sequence of these random variables as X = {X i }^| +n_1 and Y k = {Y^K^™ -1 with joint 
probability distribution p fc (x, y fc ). Assume k = l,...,m (we have m random variables as side 
information at the receiver). 

The receiver wants to compute m deterministic functions f k :Xxy k —>-Z k or:f k : X n x y% — >■ 
Z%, its vector extension. Without loss of generality, we assume / = 1 and to simplify notations, 
n will be implied by the context. We have one encoder en x and m decoders r 1; r m (one for 
each function and its corresponding side information). Encoder enx maps 

en x :X n ^{l,...,2 nR }, (23) 

and, each decoder r k maps 



r k :{l,...,2 nR }x{l,...,2 n }^Zl 



(24) 
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The probability of error in decoding f k is 

PI = Pr[(x,y fc ) : / fe (x,y fe ) ^ r fc (en*(x),y fc )], (25) 
and, the total probability of error is 

p? = i-H(i-p: k ). (26) 

k 

We declare an error when we have an error in computation of at least one function. A rate R is 
achievable if P™ — > when n — > oo. Our aim here is to find the minimum achievable rate. 

B. Main Results 

Prior work in the functional compression problem consider a case when computation of a 
function is desired at the receiver (m=l). In this section, we consider a case when computation 
of several functions is desired at the receiver. As an example, consider the network shown in 
Figure [15] The receiver wants to compute m functions with different side information random 
variables. We want to compute the minimum achievable rate for this case. Note that our results 
do not depend on the fact that all functions are desired in one receiver and one can extend 
them to the case of having several receivers with different desired functions (i.e., functions are 
separable.). 

First, let us consider the case of m = 2. Then, we extend our results to the case of arbitrary 
m. In this problem, the receiver wants to compute two deterministic functions f%(X,Yi) and 
f 2 (X,Y 2 ), where Y\ and Y 2 are available at the receiver as side information. We wish to find 
the minimum achievable rate for this problem. 

Let us call = {V,Efi) the characteristic graph of X with respect to Y\, pi(x,y{) and 

fi(X, Yi), and Gxj 2 — (V,Ef 2 ) the characteristic graph of X with respect to Y 2 , p 2 (x,y 2 ) 
and f 2 (X,Y 2 ). Now, define Gxj 1 j 2 = (^P/i,/ 2 ) sucn that Ef u f 2 = E fl \jEf 2 . In other words, 
Gxj l j 2 i s the or function of Gxj x and Gxj 2 - We call Gx,f u f % the multi-functional characteristic 
graph of X. 

When we deal with one function, we drop / from notations (as in Section [II]). 

Definition 45. The multi-functional characteristic graph Gxj 1 j 2 = {V, Ef u f 3 ) of X with respect 
to Y\, Y 2 , px(x,yx), p 2 (x,y 2 ) ,and fi(x,yi),f 2 (x,y 2 ) is defined as follows: 
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Fig. 16. A source coding scheme for multi-functional compression problem with side information. 

V = X and an edge (xi,x 2 ) E X 2 is in Ef x j 2 iff there exists a y\ E such that 
Pi( x i,yi)Pi( x 2,yi) > and fi(xi,yi) ^ fi(x 2 ,yi) or there exists a y 2 E y 2 such that 
P2(x 1 ,y 2 )p 2 (x 2 ,y 2 ) > and f 2 (x u y 2 ) ^ f 2 (x 2 ,y 2 ) . 

Similarly to Definition [TH we define the multi-functional graph entropy as follows: 

Theorem 46. 

H Gxf , (X) = lim -#* (X). (27) 

The conditional multi-functional graph entropy can be defined similarly to Definition [19] as 
follows: 

Theorem 47. 

Hg XJiJ2 (X\Y) = lim \eI (X|Y), (28) 
where G^/i / 2 * s me n " m P ow er of Gxj l j 2 - Now, we can state the following theorem. 

Theorem 48. Hq x s i f2 (X\Yi,Y 2 ) is the minimum achievable rate for the network shown in 
Figure [75] when m — 2. 

Proof: To show this, we first show that i?x > f f {X\Y\, Y 2 ) is an achievable rate 
(achievability), and no one can outperform this rate (converse). To do this, first, we show that 



39 



any valid coloring of G x ^ j 2 for any n leads to an achievable encoding-decoding scheme for 
this problem (achievability). Then, we show that every achievable encoding-decoding scheme 
performing on blocks with length n, induces a valid coloring of G x * j 2 (converse). 

Achievablity: According to (3), any valid coloring of G\ ^ leads to successfully computing 
of /i(X, Yi) at the receiver. If cc™ f is a valid coloring of G\ ^, there exists a function r\ such 
that ri(cG™ / (X), Yi) = /i(X, Yi), with high probability. A similar argument holds for Gxj 2 - 

Now, assume that cn™ is a valid coloring of G\ f f . Since, E r l C E 1 } f and E 1 } C E 1 ? . , 

x,fi,fa b ' ji — 7i,j2 72 — /i,/2' 

any valid coloring of G x ^j 2 induces valid colorings for G x ^ and G x ^. Thus, any valid 
coloring of G x ^ j 2 leads to successful computation of /i(X, Yi) and /2(X, Y 2 ) at the receiver. 

So, Co™ leads to an achievable encoding scheme (i.e. there exist two functions r\ and r 2 

x,fij 2 

such that r l (c G n, jiifa (X),Y 1 ) = f^Y,) and r 2 (c G « Jih (X.), Y 2 ) = / 2 (X,Y 2 ), with high 
probability.). 

When the receiver wants the whole information of the source node, Slepian and Wolf proposed 
a technique in flU to compress source random variable X up to the rate H(X\Y) when Y is 
available at the receiver. Here, one can perform Slepian- Wolf compression technique on the 
minimum entropy coloring of large enough power graph and get the given bound. 

Converse: Now, we show that any achievable encoding-decoding scheme performing on blocks 
with length n, induces a valid coloring of G x * In other words, we want to show that if there 
exist functions en x , r x and r 2 such that ri(enx(X), Y x ) = /i(X, Yi) and r 2 (enx(X), Y 2 ) = 
/ 2 (X, Y 2 ), enx(X) is a valid coloring of G XJij2 . 

Let us proceed by contradiction. If enx(X) were not a valid coloring of G n x * j 2 , there must 
be some edge in E^ ^ with both vertices with the same color. Let us call these two vertices 
xi and x 2 which take the same values (i.e., enx(xi) = enx(x 2 )), but also are connected. 
Since they are connected to each other, by definition of G n x ^ there exists a y 1 G such 
that pi(xi,yi)pi(x 2 ,yi) > and /i(xi,yi) ^ /i(x 2 ,yi), or there exists a y 2 G y 2 such 
that p 2 (xi,y 2 )p 2 (x 2 ,y 2 ) > and / 2 (x 1 ,y 2 ) ^ /i(x 2 ,y 2 ). Without loss of generality, assume 
that the first case occurs. Thus, we have a y 1 G such that j'i(x 1 , y 1 )p 1 (x 2 , yi) > and 
/i(xi,y x ) ^ /i(x 2 ,yi). So, ri(en x (xi), y x ) ^ r 1 (enx(x 2 ), yi). Since enx(xi) = en x (x 2 ), 
then, r\{enx ( x i), yi) 7^ ri(enx(xi), yi). But, it is not possible. Thus, our contradiction assump- 
tion was not true. In other words, any achievable encoding-decoding scheme for this problem 
induces a valid coloring of G x ^ j 2 and thus completes the proof. ■ 
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Now, let us consider the network shown in Figure [151 where the receiver wishes to compute 
m deterministic functions of source information having some side information. 

Theorem 49. Hq x f fm (X\Yi, ...,Y m ) is the minimum achievable rate for the network shown 
in Figure [75] 

The argument here is similar to the case of m = 2 given in Theorem [48] We only sketch 
the proof. One may first show that any colorings of multi-functional characteristic graph of X 
with respect to desired functions (i.e., Gxj u ...j m ) leads to an achievable scheme. Then, showing 
that any achievable encoding-decoding scheme induces a coloring on G f x,/i,...,/ m completes the 
proof. 

IV. Feedback in Functional Compression 

In this section, we investigate the effect of having feedback on the rate-region of the functional 
compression problem. If the function at the receiver is the identity function, this problem is 
Slepian-Wolf compression with feedback. For this case, having feedback does not improve rate 
bounds. For example, reference ll22l considers both zero-error and asymptotically zero-error 
Slepian-Wolf compression with feedback. However, for a general desired function at the receiver, 
having feedback may improve rate bounds of the case without feedback. 

A. Main Results 

Consider a distributed functional compression problem with two sources and a receiver de- 
picted in Figure [l7]-a. This network does not have feedback. In Section [III we derived a rate-region 
for this network. In this section, we consider the effect of having feedback on the rate-region of 
the network. For simplicity, we consider a simple distributed network topology with two sources. 
However, one can extend all discussions to more general networks of the type considered in 
Sections M and M 

Consider the network shown in Figure [FTJ-b. If the desired function at the receiver is the 
identity function, this problem is Slepian-Wolf compression with feedback. For this case, having 
feedback does not change the rate region ([4j and Il2~2l0 . However, when we have a general 
function at the receiver, by having feedback, one may improve the rate bounds of Theorem [38] 

Theorem 50. Having feedback may improve rate bounds of Theorem \38\ 
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Fig. 17. A distributed functional compression network a) without feedback b) with feedback. 



Proof: Consider a network without feedback depicted in Figure [TTJ-a. In Section HH we 
showed an achievable scheme where sources send their minimum entropy colorings of high 
probability subgraphs of their characteristic graphs satisfying C.C.C, followed by Slepian-Wolf 
compression. This scheme performs arbitrarily closely to rate bounds derived in Theorem [38] 
Now, we seek to show that, in some cases, by having feedback, one can outperform these 
bounds. Consider source random variables X 1 and X 2 with characteristic graphs Gxj and Gx 2 , 
respectively. Suppose S r min and S c > are two sets of joint colorings of source random 

I J II c Qn Qn G n G n 

Xj ' X 2 X x ' x 2 

variables defined as follows, 



S r min = are min —H(ca n , Cn n ) 

<J X 1 ' <J X 2 (c G n ,C G n )GC G n X Cgn 71 1 ^ 

Xj X 2 Xj X 2 

S c ' n r „ = arg min -H(c G ^,c G n). (29) 

G X^ G X 2 (C G ™ ,C G ™ )GC G n xC G n 71 Xl * 2 

satisfying C.C.C. 

Now, consider the case when flSV' = 0, i.e., suppose any c™ n nn G ^min 

C G™ .Gy G™ ,0& rr 17 ^Xj^Xj C GS- ,G« 

Xi ' X2 A i A-2 1 z X ] Xo 

does not satisfy C.C.C. Thus, C.C.C. restricts the link sum rates of any achievable scheme, 
because Hic^J 1 rn ) < H(c' r n rn ) for any c™ n rn G S r min and c' r „ rn G S c > 

\ l> Xl ,ti X2 ' ^ (-' Xl ,<-r X2 ^ J Lrx 1 ' ( - r X 2 G Xl ' G X 2 Lr X 1 ' Lr X 2 G Xl ' G X 2 

Choose any two joint colorings and c^n rm G SV' . Suppose 

^ J U X, C G™ ,GS. b X, i^X, G y ,G& ^ 



set A contains all points (xi, x 2 ) such that their corresponding colors in the joint-coloring class 
of cj?n" G n do not satisfy C.C.C. Now, we propose a coding scheme with feedback which can 

x l' x 2 

outperform rate bounds of the case without having feedback. If sources know whether or not 
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(b) 



Fig. 18. An example of the proposed feedback scheme, a) Since xi ^ Ax%, source Xi sends 0. Since X2 € Ax 2 , source X 2 
sends 1. b) The receiver forward signaling bits to the sources. Then, sources can use the coloring scheme c^A™ G n . 



they have some sequences in A, they can switch between and in their coding 

x l ' x 2 x l ' x 2 

scheme with feedback. Since H{ ) < H( ), this approach outperforms the one 

x l ' x 2 x l' x 2 

without feedback in terms of rates. In the following, we present a possible feedback scheme. 

Before sending each sequence, sources first check if their sequences belong to A or not. Say 
Ax 1 is the set of all xi such that there exists a x 2 such that (xi,x 2 ) G A. Ax 2 is defined 
similarly. One can see that A C A Xl x A Xi ■ So, instead of checking if a sequence is in A or not, 
by exchanging some information, sources check if the sequence belongs to A Xl X Ax 2 or not. In 
order to do this, source X\ sends a one to the receiver when Xi G A Xl . Otherwise, it sends a zero. 
Source X 2 uses a similar scheme. The receiver exchanges these bits using feedback channels. 



When a source sends a one, and receives a one from its feedback channel, it uses c' ( 



as 



its joint coloring. Otherwise, it uses Cq^ 1 G n in its coding scheme. Depending on which joint 

x l ' x 2 

coloring scheme has been used by sources, the receiver uses a corresponding look-up table to 
compute the desired function. Hence, this scheme is achievable. An example of this scheme is 
depicted in Figure [T8l 

Since the length of sequences is arbitrarily large, one can ignore these four extra signaling 
bits in rate computation. If we did not have feedback, according to Theorem | 



-Rll + Rl2 > -H(C' G U G n ). 



n 



(30) 



Say P a = Pr[(xi,x 2 ) G A Xl x A x , 2 }. Thus, for the proposed coding scheme with feedback, 
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we have, 

R f u + R { 2 > -[P a H(c' G1 iG n ) + (1 - P a )H(c™^ )] (31) 
n x i x 2 x i - x -2 

where is the transmission rate of source i with feedback. Thus, 

[R ( + E / ] _ [jRn + Rl2 ] > I(i _ P a )[H(c™ jGn ) - £T(cfc» >G » )]. (32) 

n x i x 2 x i x 2 

The right-hand side of (1321) represents a gain in link sum rates by having feedback. When, 
P a ± 1 and j£ c' G n G n , this is strictly positive, which means the proposed coding 

x l ' x 2 x l ' x 2 

scheme with feedback outperforms the one without having feedback in terms of rate bounds. 
For the identity function at the receiver, c G l n Gn , and the proposed coding scheme 

x l ' x 2 x l' x 2 

with feedback does not improve rate bounds. Note that, for the identity function at the receiver, 
Slepian-Wolf compression can perform arbitrarily closely to min-cut max-flow bounds. ■ 
Note that, in a general network, for cases where the minimum entropy colorings of sources 
satisfy C.C.C., it is not known whether or not feedback can improve rate bounds. 

V. A Rate-Distortion Region for Distributed Functional Compression 

In this section, we consider the problem of distributed functional compression with distortion. 
The objective is to compress correlated discrete sources so that an arbitrary deterministic function 
of those sources can be computed up to a distortion level at the receiver. In this section, we 
derive a rate-distortion region for a network with two transmitters and a receiver. All discussions 
can be extended to more general networks considered in Sections [XT] and [TTTJ 

A recent result is presented in which computes a rate-distortion region for the side 
information problem. The result in [3] gives a characterization of Yamamoto's rate distortion 
function iflOl in terms of a reconstruction function. Here, we extend these results to the distributed 
functional compression problem. In this case, we compute a rate-distortion region and then, 
propose a practical coding scheme with a non-trivial performance guarantee. Note that this 
proposed characterization is not a single letter characterization. 

A. Problem Setup 

Consider two sources as described in Section IV-Al The receiver wants to compute a deter- 
ministic function / : X\ X X 2 — > Z or / : X™ x Xg — > Z n , its vector extension up to distortion 
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D with respect to a given distortion function d : Z X Z — >■ [0, oo). A vector extension of the 
distortion function is defined as follows: 

1 - 

d(z 1 ,z 2 ) = -)d(z u ,z 2 i), (33) 

where z 1; z 2 G -Z n . As in ||5), we assume that rf(2 1 ,z 2 ) = if and only if z x = z 2 . This 
assumption causes vector extension to satisfy the same property (i.e., d(zi,z 2 ) = if and only 
if zi = z 2 ). 

Consider the network depicted in Figure [Q-b. The sources encode their data at rates Rn and 
R12 by using encoders en Xl and en X2 , respectively . The receiver decodes the received data by 
using decoder r. Hence, we have: 



en Xl :X?^{l,...,2 nR *} 
en X2 :* 2 *->{l,...,2 nBia } 

and a decoder maps, 

r : {1, 2 nRl1 } x {1, 2 nRl2 } -> Z". 
The probability of error is 

P e n = Pr[{(xi,x 2 ) : rf(/(xi,x 2 ),r(en Xl (xi),en X2 (x 2 ))) > £>}]. 

We say a rate pair (R n ,R 12 ) is achievable up to distortion D if there exist en Xl , en X2 and 
r such that P™ — > when n — > oo. 

Our aim is to find feasible rates for different links of the network shown in Figure [Q-b when 
the receiver wants to compute f(Xi,X 2 ) up to distortion D. 

B. Prior Results 

In this part, we overview prior relevant work. Consider the network shown in Figure [T]-a. 
For this network, in ifTOll . Yamamoto gives a characterization of a rate-distortion function for 
the side information functional compression problem (i.e., X 2 is available at the receiver). The 
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TABLE I 

Research progress on nonzero-distortion source coding problems 



Problem types 


f(X 1 ,X 2 ) = 


(X 1: 


,x 2 ) 


General f(X 1 ,X 2 ) 










Feng et al. [fTTTl 


Side information 


Wyner and 


Ziv 


m 


Yamamoto OH 










Doshi et al. 



Distributed 



Coleman et al. [0 
Berger and Yeung [[121 
Barros and Servetto lfl"3l 
Wagner et al. 031 



rate-distortion function proposed in ifTOl is a generalization of the Wyner-Ziv side-information 
rate-distortion function BU. Specifically, Yamamoto gives the rate distortion function as follows: 

Theorem 51. The rate distortion function for the functional compression problem with side 
information is 

R(D) = min ICW^XAX^ 

peV(D) 

where V(D) is the collection of all distributions on W\ given X\ such that there exists a 
g:W 1 xX 2 ^Z satisfying E[d(f(X 1 , X 2 ),g(W 1 , X 2 ))] < D. 

This is an extension of the Wyner-Ziv rate-distortion result BUl. Further, the variable W\ £ 
T(Gx 1 ) in the definition of the Orlitsky-Roche rate, Definition [lOl (a variable over the indepen- 
dent sets of Gxi) can be seen as an interpretation of Yamamoto 's auxiliary variable, W\, for the 
zero-distortion case. 

A new characterization of the rate distortion function given by Yamamoto was discussed in 
0. It was shown in [J3J that finding a suitable reconstruction function, /, is equivalent to find g 
on Wi x X 2 from Theorem [HI Let JF m {D) denote the set of all functions f m : x Xg 1 -> Z m 
such that 

lim £7[d(/(X 1 ,X 2 ),/ m (X 1 ,X 2 ))] <D, 

n— too 

and let J-(D) = {J meN J r m (D). Also, let G Xi t denote the characteristic graph of Xi with respect 
to X 2 , p(xi,x 2 ), and / for any / £ T{D). For each m and all functions / £ J-'(D), denote 
for brevity the normalized graph entropy -^Hq (Xi|X 2 ) as Hq .(Xi\X 2 ). The following 
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theorem was given in J3]|: 

Theorem 52. A rate distortion function for the network shown in Figure U}a can be expressed 
as follows: 

R(D) = inf H Gx ,(Xi|X 2 ). 

The problem of finding an appropriate function / is equivalent to finding a new graph whose 
edges are a subset of the edges of the characteristic graph. A graph parameterization by D was 
proposed in [|3l to look at a subset of T(D). The resulting bound is not tight, but it provides a 
practical technique to tackle a very difficult problem. 

Define the D-characteristic graph of X% with respect to X 2 , p(xi,x 2 ), and f(X 1 ,X 2 ), as 
having vertices V — X\ and the pair (x\, x\) is an edge if there exists some x\ G X 2 such that 
p(x\, x\)p(x\, x\) > and d(f(x\, x\), f{x\, x\)) > D. We call this graph as Gx 1 (D). Because 
d{zx,z 2 ) = if and only if z\ = z 2 , the O-characteristic graph is the characteristic graph (i.e., 
Gxi(O) = GxJ- The following corollary was given in ||3): 

Corollary 53. The rate Hg x (d)(Xi\X 2 ) is achievable. 
C. Main Results 

This section contains our contributions in this problem. Our aim is to find a rate-distortion 
region for the network shown in Figure [Q-b. Recall the Yamamoto rate distortion function 
(Theorem [57]) and Theorem [52l These theorems explain a rate distortion function for the side 
information problem. Now, we are considering the case when we have distributed functional 
compression. 

Again, for any m, let J r m (D) denote the set of all functions f m : X™ x X™ — > Z m such that 

lim S[d(/(X 1 ,X 2 ),/ rn (X 1 ,X 2 ))] < D. 

n— y oo 

In other words, we consider n blocks of m-vectors; thus, the functions in the expectation above 
will be on X™ n x X™ 71 . Let ^(D) = \J meN T m (D). Let G Xi j denote the characteristic graph of 
Xi with respect to X 2 , p(x!,x 2 ), and / for any / G J-'(D) and G X2 i denote the characteristic 
graph of X 2 with respect to X 1; p(xx, x 2 ), and / for any / G T{D\ For each m and all functions 
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/ G F{D), denote for brevity normalized graph entropies ^H Gx / (X 1 |X 2 ) as Hq x .,(X 1 \X 2 ), 
^^(X.IXO as H Gx jX 2 \ Xl ) and ^^.(Xi, X,) as F^^/Xi, X 2 ). 
Now, for a specific function / G T(D), define RAD) = (R^D) , R{ 2 (D)) such that, 

#11 > iy GxiJ (Xi|X 2 ) (34) 
> Ha XaJ (X 2 \Xi) 
RL + RL > H GxiJiGx2j (X u X 2 ). 

Theorem 54. A rate-distortion region for the network shown in Figure U}b is determined by 

U/ 6 JF(D) R f( D )- 

Proof: We want to show that [jj: e:F ^ Rf(D) determines a rate-distortion region for the 
considered network. We first show this rate-distortion region is achievable for any / G F(D), 
and then we prove every achievable rate region is a subregion of it (converse). 

According to Theorem [381 Rj{D) is sufficient to determine the function /(X 1; X 2 ) at the 
receiver. Also, by definition, 

lim £7[tZ(/(X 1 ,X 2 ),/(X 1 ,X 2 ))] < D. 

n— too 

Thus, for a specific / G F(D), Rj(D) is achievable. Therefore, the union of these achievable 
regions for different / G J-'(D) (i.e., U/e.F(.D) #/(^)) * s a ^ so achievable. 

Next, we show that any achievable rate region is a subregion of U/e^p) Rf(D). Assume that 
we have an achievable scheme in which source 1 encodes its data to en Xl (X 1 ) and source 2 
encodes its data to erix 2 (X 2 ). At the receiver, we compute r(en Xl (^-i), erix 2 (X- 2 )). Since it is an 
achievable scheme up to distortion D, there exists / G F(D) such that r(en Xl (Xx), en x . 2 (X 2 )) = 
/(X 1; X 2 ). Thus, considering Theorem l38l this achievable rate-distortion region is a subregion 
of U/ e jr(D) Rf(D). This completes the proof. ■ 

Next, we present a simple scheme which satisfies Theorem [54] Again, the problem of finding 
an appropriate function / is equivalent to finding a new graph whose edges are a subset of the 
edges of the characteristic graph of random variables. This motivates Corollary [55] where we 
use a similar graph parameterization by D. Our scheme is as follows: 

Define the ^-characteristic graph of X\ with respect to X 2 , p(xi,x 2 ), and /(X 1; X 2 ), as 
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having vertices V = X\ and the pair (x\,xf) is an edge if there exists some x\ G X 2 such that 
p(x\, x\)p(x\, x\) > and d(f(x\, x^), f(x\, x\)) > D. Denote this graph as Gx 1 (D). Similarly, 
we define Gx 2 (D). Following Corollary [53] and Theorem [541 we have the following Corollary: 

Corollary 55. For independent sources, if the distortion function is a metric and (Rn,Ri 2 ) 
satisfies the following conditions, then, (Ru,R 12 ) is achievable. 



Rn > H Gxi (d/2){X\) (35) 

R\2 > H Gx2 (D/2)(X 2 ) 
Rll+Rl2 > H Gxi ( D / 2 ) ) Gx 2 (D/2)(X 1 ,X 2 ). 

Proof: From Theorem [3H by sending colorings of high probability subgraphs of sources's 
-D/2-characteristic graphs satisfying C.C.C., one can achieve the rate region described in (|35l) . 
For simplicity, we assume the power of the graphs is one. Extensions to an arbitrary power are 
analogous. Suppose the receiver gets two colors from sources (say c 1 from source 1, and c 2 from 
source 2). To show that the receiver is able to compute its desired function up to distortion level 
D, we need to show that for every {x\,x\) and {x\, x 2 ) such that Cg Xi (d/2){x\) = Cg Xi {d/2){x\) 
and C Gx2 (d /2)(xl) = C Gx2{D/2) (x 2 2 ), we have d(f(x{, x\), f(x 2 v x\)) < D. Since the distortion 
function d is a metric, we have, 

d(f(x\, xl), f(xj, xl)) < d(f(x{, xl), f(xj, x\)) + d(f(xl, x\), f(x\, x 2 2 )) 

< D/2 + D/2 = D. (36) 

This completes the proof. ■ 

VI. Polynomial Time Cases for Finding the Minimum Entropy Coloring of a 

Characteristic Graph 

In this section, we consider the problem of finding the minimum entropy coloring of a char- 
acteristic graph. This problem arises in the functional compression problem where computation 
of a function of sources is desired at the receiver. We considered some aspects of this problem 
in Sections [|n]- [V]l and proposed some coding schemes. In those proposed coding schemes, one 
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needs to compute the minimum entropy coloring (a coloring random variable which minimizes 
the entropy) of a characteristic graph. In general, finding this coloring is an NP-hard problem 
(as shown by Cardinal et al. [fT51 ) . However, in this section, we show that depending on the 
characteristic graph's structure, there are some interesting cases where finding the minimum 
entropy coloring is not NP-hard, but tractable and practical. In one of these cases, we show that, 
having a non-zero joint probability condition on random variables' distributions, for any desired 
function /, makes characteristic graphs to be formed of some non-overlapping fully-connected 
maximal independent sets. Therefore, the minimum entropy coloring can be solved in polynomial 
time. In another case, we show that, if the function we seek to compute is a type of quantization 
functions, this problem is also tractable. 

We also consider this problem in a general case. By using Huffman or Lempel-Ziv coding 
notions, we heuristically relate finding the minimum entropy coloring to finding the maximum 
independent set of a graph. While the minimum-entropy coloring problem is a recently studied 
problem, there are some heuristic algorithms to solve approximately the maximum independent 
set problem. 

We proceed this section by stating the problem setup. Then, we explain our contributions to 
this problem. 

A. Problem Setup 

In some problems such as the functional compression problem, we need to find a coloring 
random variable of a characteristic graph which minimizes the entropy. The problem is how to 
compute such a coloring for a given characteristic graph. 

Given a characteristic graph Gx 1 (or, its n-th power, G^J, one can assign different colors 
to its vertices. Suppose Gq x is the collection of all valid colorings of this graph. Among 
these colorings, one which minimizes the entropy of the coloring random variable is called the 
minimum-entropy coloring, and we refer to it by Cq™ : 

cg£ = argmin H(c Gxi ). (37) 

1 C G Xl & °G Xl 

The problem is how to compute c^ m given Gx 1 ■ 



50 




Fig. 19. Having non-zero joint probability distribution, a) maximal independent sets cannot overlap with each other (this 
figure is to depict the contradiction) b) maximal independent sets should be fully connected to each other. In this figure, a solid 
line represents a connection, and a dashed line means no connection exists. 



B. Main Results 

In general, finding c^™ is an NP-hard problem (" lfT5lO . However, in this section, we investigate 
cases where this coloring can be computed in polynomial time, depending on the characteristic 
graph's structure. In one of these cases, we show that, by having a non-zero joint probability 
condition on random variables' distributions, for any desired function, finding Cq™ can be 
solved in polynomial time. In another case, we show that, if the function we seek to compute 
is a quantization function, this problem is also tractable. We also consider this problem in a 
general case. By using Huffman or Lempel-Ziv coding notions, we heuristically relate finding 
the minimum entropy coloring to finding the maximum independent set of a graph. 

For simplicity, we consider functions with two input random variables, but one can extend all 
discussions to functions with more input random variables than two. 

1 ) Non-Zero Joint Probability Distribution Condition: Consider the network shown in Figure 
[U-b. Source random variables have a joint probability distribution p(xi,X2), and the receiver 
wishes to compute a deterministic function of sources (i.e., /(Xi, X 2 )). In Section|lIl we showed 
that, in an achievable coding scheme, one needs to compute minimum entropy colorings of 
characteristic graphs. The question is how source nodes can compute minimum entropy colorings 
of their characteristic graphs Gx 1 and Gx 2 (or, similarly the minimum entropy colorings of 
and G*x 2 , for some n). For an arbitrary graph, this problem is NP-hard ( lfT51 ). However, in 
certain cases, depending on the probability distribution or the desired function, the characteristic 
graph has some special structure which leads to a tractable scheme to find the minimum entropy 
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coloring. In this section, we consider the effect of the probability distribution. 

Theorem 56. Suppose for all {x\,x 2 ) G X x x X 2 , p(xi,X2) > 0. Then, maximal independent sets 
of the characteristic graph Gx 1 (and, its n-th power G\ , for any n) are some non-overlapping 
fully-connected sets. Under this condition, the minimum entropy coloring can be achieved by 
assigning different colors to its different maximal independent sets. 

Proof: Suppose T(Gx 1 ) is the set of all maximal independent sets of Gx 1 - Let us proceed 
by contradiction. Consider Figure [T9]-a. Suppose Wx and w 2 are two different non-empty maximal 
independent sets. Without loss of generality, assume x\ and x\ are in w±, and x\ and x\ are in w 2 . 
These sets have a common element x\. Since w\ and w 2 are two different maximal independent 
sets, x\ ^ w 2 and x\ ^ W\. Since x\ and x\ are in w%, there is no edge between them in Gx x - 
The same argument holds for x\ and x\. But, we have an edge between x\ and x\, because w\ 
and w 2 are two different maximal independent sets, and at least there should exist such an edge 
between them. Now, we want to show that it is not possible. 

Since there is no edge between x\ and x\, for any x\ G X 2 , p(x\, x\)p(x\, x\) > 0, and 
f(x\,xl) = f(x\,x\). A similar argument can be expressed for x\ and x\. In other words, 
for any x\ G X 2 , p{x\, x\)p(x\ , x\) > 0, and f(x\,x\) = f(x\,x\). Thus, for all x\ G X 2 , 
p(x\, xl)p(xl, x\) > 0, and f(x\,x\) = f(x\,x\). However, since x\ and x\ are connected to 
each other, there should exist a x\ G X 2 such that f(x\,x\) ^ f(x\,x\) which is not possible. 
So, the contradiction assumption is not correct and these two maximal independent sets do not 
overlap with each other. 

We showed that maximal independent sets cannot have overlaps with each other. Now, we want 
to show that they are also fully connected to each other. Again, let us proceed by contradiction. 
Consider Figure [T9]-b. Suppose Wi and w 2 are two different non-overlapping maximal independent 
sets. Suppose there exists an element in w 2 (call it xf) which is connected to one of elements 
in Wi (call it x\) and is not connected to another element of W\ (call it x\). By using a similar 
discussion to the one in the previous paragraph, we may show that it is not possible. Thus, x\ 
should be connected to x\. Therefore, if for all (xi,x 2 ) G X\ x X 2 , p(xi, x 2 ) > 0, then maximal 
independent sets of Gx 1 are some separate fully connected sets. In other words, the complement 
of Gx 1 is formed by some non-overlapping cliques. Finding the minimum entropy coloring of 
this graph is trivial and can be achieved by assigning different colors to these non-overlapping 
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f(Xi,X 2 ) 



X 2 1 X 2 2 X2 3 w X2 




Fig. 20. Having non-zero joint probability condition is necessary for Theorem I56I A dark square represents a zero probability 
point. 

fully-connected maximal independent sets. 

This argument also holds for any power of Gx 1 - Suppose x{, and are some typical 
sequences in X". If xj is not connected to x^ and xf, it is not possible to have x^ and xf 
connected. Therefore, one can apply a similar argument to prove the theorem for G^, for some 
n. This completes the proof. ■ 

Here are some remarks about Theorem [56] 

• If the characteristic graph satisfying conditions of Theorem [56] is sparse, its power graph 
would also remain sparse (a sparse graph with m vertices is a graph whose number of edges 
is much smaller than m ^~ 1 ^ ). 

• The condition p(xi, x%) > 0, for all (x\, x 2 ) E XiX X 2 , is a necessary condition for Theorem 
l56l In order to illustrate this, consider Figure [20] In this example, x\, x\ and x\ are in X\, 
and x\, x\ and x\ are in X 2 . Suppose p(x\,x 2 ) = 0. By considering the value of function 
/ at these points depicted in the figure, one can see that, in Gx r , x\ is not connected to x\ 
and x\. However, x\ and x\ are connected to each other. Thus, Theorem [56] does not hold 
here. 

• The condition used in Theorem [56] only restricts the probability distribution and it does not 
depend on the function /. Thus, for any function / at the receiver, if we have a non-zero 
joint probability distribution of source random variables (for example, when source random 
variables are independent), finding the minimum-entropy coloring is easy and tractable. 
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2) Quantization Functions: In Section IVI-B11 we introduced a condition on the joint prob- 
ability distribution of random variables which leads to a specific structure of the characteristic 
graph so that finding the minimum entropy coloring is not NP-hard. In this section, we consider 
some special functions which to lead to some special graph structures. 

An interesting function is a quantization function. A natural quantization function is a function 
which separates the X\ — X 2 plane into some rectangles such that each rectangle corresponds to 
a different value of that function. Sides of these rectangles are parallel to the plane axes. Figure 
l2TT -a depicts such a quantization function. 

Given a quantization function, one can extend different sides of each rectangle in the X\ — X 2 
plane. This may make some new rectangles. We call each of them a function region. Each 
function region can be determined by two subsets of X\ and X 2 . For example, in Figure l2TT-b. 
one of the function regions is distinguished by the shaded area. 

Definition 57. Consider two function regions X\ x X\ and Xy x X 2 2 . If for any x\ G X\ and 

x\ G Xi, there exist x\ such that p(x\, x\)p(x\, > and f(x\, x\) ^ f(x\, x 2 ), we say these 
two function regions are pairwise Xi-proper. 

Theorem 58. Consider a quantization function f such that its function regions are pairwise 
Xi-proper. Then, Gx 1 (and G\ , for any n) is formed of some non-overlapping fully-connected 
maximal independent sets, and its minimum entropy coloring can be achieved by assigning 
different colors to different maximal independent sets. 

Proof: We first prove it for Gx 1 - Suppose X\ x X\, and Xf x X% are two Xi -proper 
function regions of a quantization function /, where X\ ^ X\. We show that X\ and Xf are 
two non-overlapping fully-connected maximal independent sets. By definition, X\ and X% are 
two non-equal partition sets of X 1 . Thus, they do not have any element in common. 

Now, we want to show that vertices of each of these partition sets are not connected to each 
other. Without loss of generality, we show it for X\. If this partition set of X\ has only one 
element, this is a trivial case. So, suppose x\ and x\ are two elements in X\. By definition 
of function regions, one can see that, for any x\ E X 2 such that p(x\, x\)p(xl , x\) > 0, then 
f(x\,xl) = f(x\,x\). Thus, these two vertices are not connected to each other. Now, suppose 
x\ is an element in X%. Since these function regions are AVproper, there should exist at least 
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f(x,,x 2 ) f(x,,x 2 ) 
A A 




Fig. 21. a) A quantization function. Function values are depicted in the figure on each rectangle, b) By extending sides of 
rectangles, the plane is covered by some function regions. 

one x\ G X 2 , such that p(x\, x\)p(x\ , x\) > 0, and f{x\,x\) ^ f(x\,x\). Thus, x\ and x\ 
are connected to each other. Therefore, X\ and X± are two non-overlapping fully-connected 
maximal independent sets. One can easily apply this argument to other partition sets. Thus, the 
minimum entropy coloring can be achieved by assigning different colors to different maximal 
independent sets (partition sets). The proof for G\ x , for any n, is similar to the one mentioned 
in Theorem [56l This completes the proof. ■ 
Note that without X\ -proper condition of Theorem [581 assigning different colors to different 
partitions still leads to an achievable coloring scheme. However, it is not necessarily the minimum 
entropy coloring. In other words, without this condition, maximal independent sets may overlap. 

Corollary 59. If a function f is strictly monotonic with respect to X\, and p(xi,x 2 ) ^ 0, for 

all X\ G X\ and x 2 G X 2 , then, Gx v (and, G\ for any n) is a complete graph. 

Under conditions of Corollary [591 functional compression does not give us any gain, be- 
cause, in a complete graph, one should assign different colors to different vertices. Traditional 
compression where / is the identity function is a special case of Corollary [59j 

3) Minimum Entropy Coloring for an Arbitrary Graph: Finding the minimum entropy coloring 
of an arbitrary graph (called the chromatic entropy) is NP-hard ( [[T5l ). Reference [fT51 showed 
that, even finding a coloring whose entropy is within (| — e)logm of its chromatic entropy 



55 



is NP-hard, for any e > 0, where m is the number of vertices of the graph. That is a reason 
we introduced some special structures on the characteristic graph to have some tractable and 
practical schemes to find the minimum entropy coloring. While cases investigated in Sections 
IVI-B 1 1 and IVI-B2I cover certain practical cases, in this part, we want to consider this problem 
without assuming any special structure of the graph. In particular, we show that, by using a notion 
of an empirical Huffman coding scheme or a Lempel-Ziv coding scheme, one can heuristically 
relate finding the minimum-entropy coloring problem and finding the maximum independent set 
problem. While the minimum-entropy coloring problem is a recently studied problem, there are 
some heuristic algorithms to solve the maximum independent set problem ll29l . 

Suppose Gxj is the characteristic graph of X\. Without loss of generality, in this section, 
we consider n — 1. All discussions can be extended to G\ , for any n. Suppose p(xi) is the 
probability distribution of X±. Let us define the adjacency matrix A = [a^] for this graph as 
follows: dij = 1 when x\ and x{ are connected to each other in Gx x , otherwise, = 0. One 
can see that the adjacency matrix is symmetric, with all zeros in its diagonal. A one in this 
matrix means that its corresponding vertices should be assigned to different colors. 

Let us define a permutation matrix P with the same size of A. This matrix has only a one in 
each of its rows and columns. The matrix PAP 1 would be a matrix such that rows and columns 
of A are reordered simultaneously, with respect to this permutation matrix P. For any valid 
coloring, there exists a permutation matrix P, such that PAP f has zero square matrices on its 
diagonal. This reordering is such that, vertices with the same color are adjacent to each other 
in PAP 1 . Each of these zero square matrices on the diagonal of PAP 1 represents a maximal 
independent set, or equivalently a color class. One can see that there exists a bijective mapping 
between any valid coloring and any permutation matrix P which leads to have some zero square 
matrices on the diagonal of PAP 1 . 

Example 60. For an example, consider the coloring of the graph depicted in Figure \7\ This 
coloring leads to the following PAP 1 matrix: 
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PAP 1 



D 1 D 2 



D 1 



Do 






£3 



(38) 



/ 



where D it i = 1,2,3 are non-zero matrices. Each of zero square matrices on the diagonal 
represents a color class, or a maximal independent set of this graph. The permutation matrix P 
in this case is, 

( 1 \ 

1 

1 • (39) 
10 

\ 1 / 

Now, we want to take the probability distribution into account. To do this, we repeat each 



vertex x\ in the adjacency matrix, rii times, such that 



p{x{) 



for any valid i and j. We call 



the achieved matrix, the weighted adjacency matrix and denote it by A w . The above argument 
about the permutation remains the same. Any valid coloring can be represented by a permutation 
matrix P such that PA w P l has some zero square matrices on its diagonal. Since we represent the 
probability distribution of each vertex as its number of repetitions in A w , the proportional sizes 
of zero square matrices on the diagonal of PA w P t represent the corresponding probability of 
that color class. In other words, a color class of a larger zero square matrix has more probability 
than a color class with a smaller zero square matrix. 

Now, one can heuristically use Huffman coding technique to find a coloring (or its correspond- 
ing permutation matrix) to minimize the entropy. To do this, we first find a permutation matrix 
which leads to the largest zero square matrix on the diagonal of PA w P l . Then, we assign a color 
to that class, and eliminate its corresponding rows and columns. We repeat this algorithm till all 
vertices are assigned to some colors. One can see that, finding the largest zero square matrix on 
the diagonal of PA w P l is equivalent to finding the maximum independent set of a graph. Note 
that, it is a heuristic algorithm, and does not necessarily reach to the minimum entropy coloring. 
The other point is that, here, we have assumed that the probability distribution of X 1 is known. 
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If we do not know this probability distribution, one can use an empirical distribution, instead 
of the actual distribution. In that case, using a Lempel-Ziv coding notion instead of Huffman 
coding leads to a similar algorithm. 

VII. Conclusions and Future Work 

In this paper, we considered different aspects of the functional compression problem where 
computing a function (or, some functions) of sources is desired at the receiver(s). The rate 
region of this problem has been considered in the literature under certain restrictive assump- 
tions, particularly in terms of the network topology, the functions and the characteristics of the 
sources. In this paper, we significantly relaxed these assumptions. In Section [XT] of this paper, we 
considered this problem for an arbitrary tree network and asymptotically lossless computation 
and derived rate lower bounds. We showed that, for one stage tree networks with correlated 
sources, or for general trees with independent sources, these lower bounds are tight. For these 
cases, we proposed a modularized coding scheme based on graph colorings and Slepian-Wolf 
compression which performs arbitrarily closely to rate lower bounds. Optimal computations 
that should be performed at intermediate nodes are derived, for a general tree network with 
independent sources. We showed that, for a family of functions and random variables called 
chain rule proper sets, computation at intermediate nodes is not necessary. We also introduced a 
new condition on colorings of source random variables' characteristic graphs called the coloring 
connectivity condition (C.C.C.) and showed that, unlike the condition mentioned in Doshi et ah, 
this condition is necessary and sufficient for any achievable coding scheme based on colorings. 
We also showed that, unlike entropy, graph entropy does not satisfy the chain rule. 

The problem of having different desired functions with side information at the receiver was 
considered in Section Unl For this problem, we defined a new concept named multi-functional 
graph entropy, an extension of graph entropy defined by Korner to the multi-functional case. 
We showed that, the minimum achievable rate for this problem with side information is equal 
to conditional multi-functional graph entropy of the source random variable given the side 
information. We also proposed a coding scheme based on graph colorings to achieve this rate. 

In Section |lVl we investigated the effect of having feedback on the rate region of the functional 
compression problem. If the function at the receiver is the identity function, this problem reduces 
to the Slepian-Wolf compression with feedback, for which having feedback does not increase 
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the rate. However, we showed that, in general, feedback can improve rate bounds. 

The problem of distributed functional compression with distortion was investigated in Section 
|Vl The objective is to compress correlated discrete sources so that an arbitrary deterministic 
function of those sources can be computed within a distortion level at the receiver. In this case, 
we computed a rate-distortion region for this problem which is not a single letter characterization. 
Then, we proposed a simple suboptimal coding scheme with a non-trivial performance guarantee. 

In these proposed coding schemes, one needs to compute the minimum entropy coloring of a 
characteristic graph. In general, finding this coloring is an NP-hard problem (" lfT5lO . However, in 
Section [VH we showed that depending on the characteristic graph's structure, there are certain 
cases where finding the minimum entropy coloring is not NP-hard, but tractable and practical. In 
one of these cases, we showed that, by having a non-zero joint probability condition on random 
variables' distributions, for any desired function, finding the minimum entropy coloring can be 
solved in polynomial time. In another case, we showed that, if the desired function is a type of 
quantization functions, this problem is also tractable. 

For possible future work, one may consider a general network topology rather than tree 
networks. For instance, one can consider a general multi- source multicast network in which 
receivers desire to have a deterministic function of source random variables. For the case of 
having the identity function at the receivers, this problem is well- studied in ll26l . Il27ll and ESI 
under the name of network coding for multi-source multicast networks. Reference ll28l shows 
that, random linear network coding can perform arbitrarily closely to min-cut max-flow bounds. 
To have an achievable scheme for the functional version of this problem, one may perform 
random network coding on coloring random variables satisfying C.C.C. If receivers desire 
different functions, one can use colorings of multi-functional characteristic graphs satisfying 
C.C.C, and then use random network coding for these coloring random variables. This achievable 
scheme can be extended to disjoint multicast and disjoint multicast plus multicast cases described 
in ll27ll . This scheme is an achievable scheme; however it is not optimal in general. If sources are 
independent, one may use encoding/decoding functions derived for tree networks at intermediate 
nodes, along with network coding. 

Throughout this paper, we considered asymptotically lossless or lossy computation of a func- 
tion. For possible future work, one may consider this problem for the zero-error computation of 
a function which leads to a communication complexity problem. One can use tools and schemes 
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we have developed in this paper to attain some achievable schemes in the zero error computation 
case. 
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